Abstract
Image denoising is a fundamental problem in computer vision and multimedia computation. Non-local filters are effective for image denoising. But existing deep learning methods that use non-local computation structures are mostly designed for high-level tasks, and global self-attention is usually adopted. For the task of image denoising, they have high computational complexity and have a lot of redundant computation of uncorrelated pixels. To solve this problem and combine the marvelous advantages of non-local filter and deep learning, we propose a Convolutional Unbiased Regional (CUR) transformer. Based on the prior that, for each pixel, its similar pixels are usually spatially close, our insights are that (1) we partition the image into non-overlapped windows and perform regional self-attention to reduce the search range of each pixel, and (2) we encourage pixels across different windows to communicate with each other. Based on our insights, the CUR transformer is cascaded by a series of convolutional regional self-attention (CRSA) blocks with U-style short connections. In each CRSA block, we use convolutional layers to extract the query, key, and value features, namely Q, K, and V, of the input feature. Then, we partition the Q, K, and V features into local non-overlapped windows and perform regional self-attention within each window to obtain the output feature of this CRSA block. Among different CRSA blocks, we perform the unbiased window partition by changing the partition positions of the windows. Experimental results show that the CUR transformer outperforms the state-of-the-art methods significantly on four low-level vision tasks, including real and synthetic image denoising, JPEG compression artifact reduction, and low-light image enhancement.
- [1] . 2018. A high-quality denoising dataset for smartphone cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1692–1700.Google Scholar
Cross Ref
- [2] . 2009. Gaussian kd-trees for fast high-dimensional filtering. In ACM SIGGRAPH 2009 Papers. 1–12.Google Scholar
- [3] . 2021. Scaling local self-attention for parameter efficient visual backbones. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- [4] . 2008. Efficient nonlocal means for denoising of textural patterns. IEEE Transactions on Image Processing 17, 7 (2008), 1083–1092.Google Scholar
Digital Library
- [5] . 2005. A non-local algorithm for image denoising. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2. IEEE, 60–65.Google Scholar
Digital Library
- [6] . 2020. Spatial-adaptive network for single image denoising. In European Conference on Computer Vision. Springer, 171–187.Google Scholar
Digital Library
- [7] . 2021. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12299–12310.Google Scholar
Cross Ref
- [8] . 2016. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2016), 1256–1272.Google Scholar
Digital Library
- [9] . 2007. Color image denoising via sparse 3D collaborative filtering with grouping constraint in luminance-chrominance space. In 2007 IEEE International Conference on Image Processing, Vol. 1. IEEE, I–313.Google Scholar
Cross Ref
- [10] . 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- [11] . 2015. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision. 576–584.Google Scholar
Digital Library
- [12] . 2015. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 2 (2015), 295–307.Google Scholar
Digital Library
- [13] . 2015. Region-based temporally consistent video post-processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 714–722.Google Scholar
- [14] . 2022. A colorization framework for monochrome-color dual-lens systems using a deep convolutional network. IEEE Transactions on Visualization and Computer Graphics Vol. 28. 1469–1485.Google Scholar
Cross Ref
- [15] . 2019. Learning a deep convolutional network for colorization in monochrome-color dual-lens system. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8255–8262.Google Scholar
Digital Library
- [16] . 2020. Cycle-CNN for colorization towards real monochrome-color camera systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10721–10728.Google Scholar
Cross Ref
- [17] . 2021. Self-supervised colorization towards monochrome-color camera systems using cycle CNN. IEEE Transactions on Image Processing 30 (2021), 6609–6622.Google Scholar
Digital Library
- [18] . 2011. Fast efficient algorithm for enhancement of low lighting video. In 2011 IEEE International Conference on Multimedia and Expo. IEEE, 1–6.Google Scholar
Digital Library
- [19] . 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google Scholar
- [20] . 2019. Spatio-temporal encoder-decoder fully convolutional network for video-based dimensional emotion recognition. IEEE Transactions on Affective Computing 12, 3 (2019), 565–578.Google Scholar
Cross Ref
- [21] . 2007. Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images. IEEE Transactions on Image Processing 16, 5 (2007), 1395–1411.Google Scholar
Digital Library
- [22] . 2016. A fusion-based enhancing method for weakly illuminated images. Signal Processing 129 (2016), 82–96.Google Scholar
Digital Library
- [23] . 2016. A weighted variational model for simultaneous reflectance and illumination estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2782–2790.Google Scholar
Cross Ref
- [24] . 2016. LIME: Low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing 26, 2 (2016), 982–993.Google Scholar
Digital Library
- [25] . 2012. Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 6 (2012), 1397–1409.Google Scholar
Digital Library
- [26] . 2015. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5197–5206.Google Scholar
Cross Ref
- [27] . 2012. Loss-specific training of non-parametric image restoration models: A new state of the art. In European Conference on Computer Vision. Springer, 112–125.Google Scholar
Digital Library
- [28] . 2014. Visual persuasion: Inferring communicative intents of images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 216–223.Google Scholar
Digital Library
- [29] . 2019. High-density impulse noise detection and removal using deep convolutional neural network with particle swarm optimisation. IET Image Processing 13, 2 (2019), 365–374.Google Scholar
Cross Ref
- [30] . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- [31] . 2011. Natural image denoising: Optimality and inherent bounds. In CVPR’11. IEEE, 2833–2840.Google Scholar
- [32] . 2018. Structure-revealing low-light image enhancement via robust retinex model. IEEE Transactions on Image Processing 27, 6 (2018), 2828–2841.Google Scholar
Cross Ref
- [33] . 2021. Human emotion recognition with relational region-level analysis. IEEE Transactions on Affective Computing (2021).Google Scholar
Digital Library
- [34] . 2016. Joint image-text news topic detection and tracking by multimodal topic and-or graph. IEEE Transactions on Multimedia 19, 2 (2016), 367–381.Google Scholar
Digital Library
- [35] . 2021. Swinir: Image restoration using Swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1833–1844.Google Scholar
Cross Ref
- [36] . 2021. Cameranet: A two-stage framework for effective camera ISP learning. IEEE Transactions on Image Processing 30 (2021), 2248–2262.Google Scholar
Digital Library
- [37] . 2014. Microsoft Coco: Common objects in context. In European Conference on Computer Vision. Springer, 740–755.Google Scholar
Cross Ref
- [38] . 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021).Google Scholar
- [39] . 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the 8th IEEE International Conference on Computer Vision (ICCV’01), Vol. 2. IEEE, 416–423.Google Scholar
Cross Ref
- [40] . 2018. Burst denoising with kernel prediction networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2502–2510.Google Scholar
Cross Ref
- [41] . [n.d.]. https://www.mindspore.cn/.Google Scholar
- [42] . 2021. Adaptive consistency prior based deep network for image denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8596–8606.Google Scholar
Cross Ref
- [43] . 2018. Joint enhancement and denoising method via sequential decomposition. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS’18). IEEE, 1–5.Google Scholar
- [44] . 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234–241.Google Scholar
Cross Ref
- [45] . 2005. LIVE image quality assessment database release 2. http://live.ece.utexas.edu/research/quality.Google Scholar
- [46] . 2020. Attention-guided CNN for image denoising. Neural Networks 124 (2020), 117–129.Google Scholar
Digital Library
- [47] . 1998. Bilateral filtering for gray and color images. In 6th International Conference on Computer Vision (IEEE Cat. No. 98CH36271). IEEE, 839–846.Google Scholar
Cross Ref
- [48] . 2021. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning. PMLR, 10347–10357.Google Scholar
- [49] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.Google Scholar
- [50] . 2019. Underexposed photo enhancement using deep illumination estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6849–6857.Google Scholar
Cross Ref
- [51] . 2018. GLADNet: Low-light enhancement network with global awareness. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG’18). IEEE, 751–755.Google Scholar
Digital Library
- [52] . 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.Google Scholar
Cross Ref
- [53] . 2019. U-Net conditional GANs for photo-realistic and identity-preserving facial expression synthesis. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 3s (2019), 1–23.Google Scholar
Digital Library
- [54] . 2021. Facial expression animation by landmark guided residual module. IEEE Transactions on Affective Computing (2021).Google Scholar
Digital Library
- [55] . 2012. Combining tensor space analysis and active appearance models for aging effect simulation on face images. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 4 (2012), 1107–1118.Google Scholar
Digital Library
- [56] . 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.Google Scholar
Digital Library
- [57] . 2018. Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560 (2018).Google Scholar
- [58] . 2021. Cvt: Introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808 (2021).Google Scholar
- [59] . 2021. Twins: Revisiting spatial attention design in vision transformers. Arxiv (2021).Google Scholar
- [60] . 2017. A bio-inspired multi-exposure fusion framework for low-light image enhancement. arXiv preprint arXiv:1711.00591 (2017).Google Scholar
- [61] . 2017. A new low-light image enhancement algorithm using camera response model. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3015–3022.Google Scholar
Cross Ref
- [62] . 2021. Incorporating convolution designs into visual transformers. arXiv preprint arXiv:2103.11816 (2021).Google Scholar
- [63] . 2022. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5728–5739.Google Scholar
Cross Ref
- [64] . 2022. Learning image-adaptive 3D lookup tables for high performance photo enhancement in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44. 2058–2073.Google Scholar
- [65] . 2017. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing 26, 7 (2017), 3142–3155.Google Scholar
Digital Library
- [66] . 2018. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV’18). 286–301.Google Scholar
Digital Library
- [67] . 2019. Residual non-local attention networks for image restoration. arXiv preprint arXiv:1903.10082 (2019).Google Scholar
- [68] . 2020. Residual dense network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 7 (2020), 2480–2495.Google Scholar
Cross Ref
- [69] . 2019. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the 27th ACM International Conference on Multimedia. 1632–1640.Google Scholar
Digital Library
Index Terms
CUR Transformer: A Convolutional Unbiased Regional Transformer for Image Denoising
Recommendations
Combining curvelet transform and wavelet transform for image denoising
ICIC'10: Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computingWavelet transform has the good characteristic of time-frequency locality and many researches show that it can perform well for denoising in smooth and singular areas. But it isn't suitable for describing the signals, which have high dimensional ...
Image Denoising Based on the Dyadic Wavelet Transform
ICCIMA '03: Proceedings of the 5th International Conference on Computational Intelligence and Multimedia ApplicationsSince subsampling does not take place in image dyadic wavelet transform at each level, image representation in dyadic wavelet domain compared with wavelet series reconstruction is very redundant and part of disturbance of image dyadic wavelet ...
Image denoising using SVM classification in nonsubsampled contourlet transform domain
For image denoising, the main challenge is how to preserve the information-bearing structures such as edges and textures to get satisfactory visual quality when improving the signal-to-noise-ratio (SNR). Edge-preserving image denoising has become a very ...






Comments