Abstract
Recently, attention mechanisms have shown a developing tendency toward convolutional neural network (CNN), and some representative attention mechanisms, i.e., channel attention (CA) and spatial attention (SA) have been fully applied to single image super-resolution (SISR) tasks. However, the existing architectures directly apply these attention mechanisms to SISR without much consideration of the nature characteristic, resulting in less strong representational power. In this article, we propose a novel kernel attention module (KAM) for SISR, which enables the network to adjust its receptive field size corresponding to various scales of input by dynamically selecting the appropriate kernel. Based on this, we stack multiple kernel attention modules with group and residual connection to constitute a novel architecture for SISR, which enables our network to learn more distinguishing representations through filtering the information under different receptive fields. Thus, our network is more sensitive to multi-scale features, which enables our single network to deal with multi-scale SR task by predefining the upscaling modules. Besides, other attention mechanisms in super-resolution are also investigated and illustrated in detail in this article. Thanks to the kernel attention mechanism, the extensive benchmark evaluation shows that our method outperforms the other state-of-the-art methods.
- Eirikur Agustsson and Radu Timofte. 2017. NTIRE 2017 challenge on single image super-resolution: Dataset and study. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, Honolulu, HI, July 21--26, 2017. 1122--1131.Google Scholar
- Namhyuk Ahn, Byungkon Kang, and Kyung-Ah Sohn. 2018. Fast, accurate, and lightweight super-resolution with cascading residual network. In 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, September 8--14, 2018, Part X. 256--272.Google Scholar
Cross Ref
- Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 6077--6086.Google Scholar
Cross Ref
- Adrian Bulat, Jing Yang, and Georgios Tzimiropoulos. 2018. To learn image super-resolution, use a GAN to learn how to do image degradation first. In 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, September 8--14, 2018, Part VI. 187--202.Google Scholar
Cross Ref
- Jie Chen, Jie Shao, and Chengkun He. 2020. Movie fill in the blank by joint learning from video and text with adaptive temporal attention. Pattern Recognit. Lett. 132 (2020), 62--68.Google Scholar
Cross Ref
- Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang. 2019. Second-order attention network for single image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, June 16--20, 2019. 11065--11074.Google Scholar
Cross Ref
- Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a deep convolutional network for image super-resolution. In 13th European Conference on Computer Vision, ECCV 2014, Zurich, Switzerland, September 6--12, 2014, Part IV. 184--199.Google Scholar
Cross Ref
- Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2016. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2 (2016), 295--307.Google Scholar
Digital Library
- Chao Dong, Chen Change Loy, and Xiaoou Tang. 2016. Accelerating the super-resolution convolutional neural network. In 14th European Conference on Computer Vision, ECCV 2016, Amsterdam, The Netherlands, October 11--14, 2016, Part II. 391--407.Google Scholar
Cross Ref
- Gilad Freedman and Raanan Fattal. 2011. Image and video upscaling from local self-examples. ACM Trans. Graph. 30, 2 (2011), 12:1--12:11.Google Scholar
Digital Library
- Lianli Gao, Xiangpeng Li, Jingkuan Song, and Heng Tao Shen. 2020. Hierarchical LSTMs with adaptive attention for visual captioning. IEEE Trans. Pattern Anal. Mach. Intell. 42, 5 (2020), 1112--1131.Google Scholar
- Jianting Guo, Peijia Zheng, and Jiwu Huang. 2017. An efficient motion detection and tracking scheme for encrypted surveillance videos. TOMCCAP 13, 4 (2017), 61:1--61:23.Google Scholar
- Wei Han, Shiyu Chang, Ding Liu, Mo Yu, Michael Witbrock, and Thomas S. Huang. 2018. Image super-resolution via dual-state recurrent networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 1654--1663.Google Scholar
- Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2018. Deep back-projection networks for super-resolution. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 1664--1673.Google Scholar
Cross Ref
- Chen He and Haifeng Hu. 2019. Image captioning with visual-semantic double attention. TOMCCAP 15, 1 (2019), 26:1--26:16.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27--30, 2016. 770--778.Google Scholar
Cross Ref
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861. arxiv:1704.04861Google Scholar
- Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 7132--7141.Google Scholar
Cross Ref
- Yanting Hu, Jie Li, Yuanfei Huang, and Xinbo Gao. 2019. Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Trans. Circuits Syst. Video Techn. DOI:https://doi.org/10.1109/TCSVT.2019.2915238Google Scholar
- Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21--26, 2017. 2261--2269.Google Scholar
- Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. 2015. Single image super-resolution from transformed self-exemplars. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, June 7--12, 2015. 5197--5206.Google Scholar
Cross Ref
- Zheng Hui, Xinbo Gao, Yunchu Yang, and Xiumei Wang. 2019. Lightweight image super-resolution with information multi-distillation network. In 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21--25, 2019. 2024--2032.Google Scholar
Digital Library
- Zheng Hui, Xiumei Wang, and Xinbo Gao. 2018. Fast and accurate single image super-resolution via information distillation network. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 723--731.Google Scholar
Cross Ref
- Michal Irani and Shmuel Peleg. 1991. Improving resolution by image registration. CVGIP: Graphical Model and Image Processing 53, 3 (1991), 231--239.Google Scholar
Digital Library
- Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Accurate image super-resolution using very deep convolutional networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27--30, 2016. 1646--1654.Google Scholar
Cross Ref
- Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Deeply-recursive convolutional network for image super-resolution. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27--30, 2016. 1637--1645.Google Scholar
Cross Ref
- Jun-Hyuk Kim, Jun-Ho Choi, Manri Cheon, and Jong-Seok Lee. 2020. MAMNet: Multi-path adaptive modulation network for image super-resolution. Neurocomput 402 (2020), 38--49. DOI:https://doi.org/10.1016/j.neucom.2020.03.069Google Scholar
Cross Ref
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7--9, 2015.Google Scholar
- Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2019. Fast and accurate image super-resolution with deep Laplacian pyramid networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 11 (2019), 2599--2613.Google Scholar
Cross Ref
- Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21--26, 2017. 105--114.Google Scholar
Cross Ref
- Juncheng Li, Faming Fang, Kangfu Mei, and Guixu Zhang. 2018. Multi-scale residual network for image super-resolution. In 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, September 8--14, 2018, Part VIII. 527--542.Google Scholar
Cross Ref
- Xianguo Li, Yemei Sun, Yanli Yang, and Changyun Miao. 2019. Symmetrical residual connections for single image super-resolution. TOMCCAP 15, 1 (2019), 19:1--19:10.Google Scholar
- Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang. 2019. Selective kernel networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, June 16--20, 2019. 510--519.Google Scholar
Cross Ref
- Zhen Li, Jinglei Yang, Zheng Liu, Xiaomin Yang, Gwanggil Jeon, and Wei Wu. 2019. Feedback network for image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, June 16--20, 2019. 3867--3876.Google Scholar
Cross Ref
- Qianli Liao and Tomaso A. Poggio. 2016. Bridging the gaps between residual learning, recurrent neural networks and visual cortex. CoRR abs/1604.03640 (2016).Google Scholar
- Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. 2017. Enhanced deep residual networks for single image super-resolution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, Honolulu, HI, July 21--26, 2017. 1132--1140.Google Scholar
Cross Ref
- Heng Liu, Jungong Han, Shudong Hou, Ling Shao, and Ruan Yue. 2018. Single image super-resolution using a deep encoder-decoder symmetrical network with iterative back projection. Neurocomput. 282 (2018), 52--59.Google Scholar
Cross Ref
- Xiao-Jiao Mao, Chunhua Shen, and Yu-Bin Yang. 2016. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain. 2802--2810.Google Scholar
- David R. Martin, Charless C. Fowlkes, Doron Tal, and Jitendra Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In 8th International Conference On Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7--14, 2001, Volume 2. 416--425.Google Scholar
Cross Ref
- Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2017. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76, 20 (2017), 21811--21838.Google Scholar
Digital Library
- Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada. 2204--2212.Google Scholar
- Jongchan Park, Sanghyun Woo, Joon-Young Lee, and In So Kweon. 2018. BAM: Bottleneck attention module. In British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, UK, September 3--6, 2018. 147.Google Scholar
- Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27--30, 2016. 1874--1883.Google Scholar
Cross Ref
- Assaf Shocher, Nadav Cohen, and Michal Irani. 2018. “Zero-shot” Super-resolution using deep internal learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 3118--3126.Google Scholar
Cross Ref
- Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, and Heng Tao Shen. 2019. From deterministic to generative: Multimodal stochastic RNNs for video captioning. IEEE Trans. Neural Networks Learn. Syst. 30, 10 (2019), 3047--3058.Google Scholar
Cross Ref
- Ying Tai, Jian Yang, and Xiaoming Liu. 2017. Image super-resolution via deep recursive residual network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21--26, 2017. 2790--2798.Google Scholar
Cross Ref
- Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. 2017. MemNet: A persistent memory network for image restoration. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. 4549--4557.Google Scholar
Cross Ref
- Anqi Wang, Haifeng Hu, and Liang Yang. 2018. Image captioning with affective guiding and selective attention. TOMCCAP 14, 3 (2018), 73:1--73:15.Google Scholar
- Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017. Adversarial cross-modal retrieval. In 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, October 23--27, 2017. 154--162.Google Scholar
Digital Library
- Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600--612.Google Scholar
Digital Library
- Zhihao Wang, Jian Chen, and Steven C. H. Hoi. 2019. Deep learning for image super-resolution: A survey. CoRR abs/1902.06068.Google Scholar
- Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, September 8--14, 2018, Part VII. 3--19.Google Scholar
Cross Ref
- Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Process. 26, 5 (2017), 2494--2507.Google Scholar
Digital Library
- Jianchao Yang, John Wright, Thomas S. Huang, and Yi Ma. 2008. Image super-resolution as sparse representation of raw image patches. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2008, June 24–26, 2008, Anchorage, AK.Google Scholar
- Xin Yang, Haiyang Mei, Jiqing Zhang, Ke Xu, Baocai Yin, Qiang Zhang, and Xiaopeng Wei. 2019. DRFN: Deep recurrent fusion network for single-image super-resolution with large factors. IEEE Trans. Multimedia 21, 2 (2019), 328--337.Google Scholar
Digital Library
- Yuan Yuan, Siyuan Liu, Jiawei Zhang, Yongbing Zhang, Chao Dong, and Liang Lin. 2018. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2018, Salt Lake City, UT, June 18--22, 2018. 701--710.Google Scholar
Cross Ref
- Dongyang Zhang, Jie Shao, Gang Hu, and Lianli Gao. 2017. Sharp and real image super-resolution using generative adversarial network. In 24th International Conference on Neural Information Processing, ICONIP 2017, Guangzhou, China, November 14--18, 2017, Part III. 217--226.Google Scholar
Cross Ref
- Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. 2017. Learning deep CNN denoiser prior for image restoration. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21--26, 2017. 2808--2817.Google Scholar
Cross Ref
- Kai Zhang, Wangmeng Zuo, and Lei Zhang. 2018. Learning a single convolutional super-resolution network for multiple degradations. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 3262--3271.Google Scholar
Cross Ref
- Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018. Image super-resolution using very deep residual channel attention networks. In 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, September 8--14, 2018, Part VII. 294--310.Google Scholar
Cross Ref
- Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, and Yun Fu. 2019. Residual non-local attention networks for image restoration. In International Conference on Learning Representations, ICLR 2019, New Orleans, LA.Google Scholar
- Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. 2018. Residual dense network for image super-resolution. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 2472--2481.Google Scholar
Cross Ref
- Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. 2017. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3, 1 (2017), 47--57.Google Scholar
Cross Ref
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. 2242--2251.Google Scholar
Index Terms
Kernel Attention Network for Single Image Super-Resolution
Recommendations
Information-Growth Attention Network for Image Super-Resolution
MM '21: Proceedings of the 29th ACM International Conference on MultimediaIt is generally known that a high-resolution (HR) image contains more productive information compared with its low-resolution (LR) versions, so image super-resolution (SR) satisfies an information-growth process. Considering the property, we attempt to ...
Deep Residual Attention Network for Spectral Image Super-Resolution
Computer Vision – ECCV 2018 WorkshopsAbstractSpectral imaging sensors often suffer from low spatial resolution, as there exists an essential tradeoff between the spectral and spatial resolutions that can be simultaneously achieved, especially when the temporal resolution needs to be ...
Single image super-resolution using regularization of non-local steering kernel regression
One promising technique for single image super-resolution (SR) is reconstruction-based framework, where the key issue is to apply reasonable prior knowledge to well pose the solution to upsampled images. In this paper, we employ the non-local steering ...






Comments