skip to main content
research-article

FastCNN: Towards Fast and Accurate Spatiotemporal Network for HEVC Compressed Video Enhancement

Published:25 February 2023Publication History
Skip Abstract Section

Abstract

Deep neural networks have achieved remarkable success in HEVC compressed video quality enhancement. However, most existing multiframe-based methods either deliver unsatisfactory results or consume a significant amount of resources to leverage temporal information of neighboring frames. For the sake of practicality, a thorough investigation of the architecture design of the video quality enhancement network regarding enhancement performance, model parameters, and running speed is essential. In this article, we first propose an efficient alignment module that can quickly and accurately aggregate the spatiotemporal information of neighboring frames. The proposed module estimates deformable offsets progressively in lower-resolution space motivated by the observation of offset correlations between adjacent pixels. Then, the quantization parameter (QP) that represents compression level prior knowledge is utilized to guide aligned feature enhancement. By combining alignment feature distillation with residual feature correction, we obtain an efficient QP attention block. To save the storage space of the network, we design a hash buffer to store QP embedding features. These efficient components allow our network to effectively exploit temporal redundancies and obtain favorable enhancement capability while maintaining a lightweight structure and fast running speed. Extensive experiments demonstrate that the proposed approach outperforms state-of-the-art methods over different QPs by up to 0.09 to 0.11 dB, whereas the inference time can be reduced by up to 69%.

REFERENCES

  1. [1] Anwar Saeed and Barnes Nick. 2019. Real image denoising with feature attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 31553164.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Bjøntegaard G.. 2001. Document VCEG-M33: Calculation of average PSNR differences between RD-Curves. Proceedings of the ITU-T Video Coding Experts Group (VCEG’19) 13th Meeting (012001).Google ScholarGoogle Scholar
  3. [3] Bossen Frank. 2011. Common test conditions and software reference configurations. In JCT-VC, Torino, Italy, Tech. Rep. JCTVC-L1100.Google ScholarGoogle Scholar
  4. [4] Bross Benjamin, Wang Ye-Kui, Ye Yan, Liu Shan, Chen Jianle, Sullivan Gary J., and Ohm Jens-Rainer. 2021. Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 37363764.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chan Kelvin C. K., Wang Xintao, Yu Ke, Dong Chao, and Loy Chen Change. 2020. Understanding deformable alignment in video super-resolution. arXiv preprint arXiv:2009.07265 4 (2020), 3.Google ScholarGoogle Scholar
  6. [6] Chang Meng, Li Qi, Feng Huajun, and Xu Zhihai. 2020. Spatial-adaptive network for single image denoising. In European Conference on Computer Vision. Springer, 171187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Dai Jifeng, Qi Haozhi, Xiong Yuwen, Li Yi, Zhang Guodong, Hu Han, and Wei Yichen. 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 764773.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Deng Jianing, Wang Li, Pu Shiliang, and Zhuo Cheng. 2020. Spatio-temporal deformable convolution for compressed video quality enhancement. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1069610703.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Ding Qing, Shen Liquan, Yu Liangwei, Yang Hao, and Xu Mai. 2021. Patch-wise spatial-temporal quality enhancement for HEVC compressed video. IEEE Transactions on Image Processing 30 (2021), 64596472. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Dong Chao, Deng Yubin, Loy Chen Change, and Tang Xiaoou. 2015. Compression artifacts reduction by a deep convolutional network. In 2015 IEEE International Conference on Computer Vision (ICCV’15). 576584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Fu Chih-Ming, Alshina Elena, Alshin Alexander, Huang Yu-Wen, Chen Ching-Yeh, Tsai Chia-Yang, Hsu Chih-Wei, Lei Shaw-Min, Park Jeong-Hoon, and Han Woo-Jin. 2012. Sample adaptive offset in the HEVC standard. IEEE Transactions on Circuits and Systems for Video Technology 22 (122012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Galteri Leonardo, Seidenari Lorenzo, Bertini Marco, and Bimbo Alberto Del. 2017. Deep generative adversarial compression artifact removal. In 2017 IEEE International Conference on Computer Vision (ICCV’17). 48364845. Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).Google ScholarGoogle Scholar
  14. [14] Guan Zhenyu, Xing Qunliang, Xu Mai, Yang Ren, Liu Tie, and Wang Zulin. 2019. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence PP (102019), 11. Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hang Yucheng, Liao Qingmin, Yang Wenming, Chen Yupeng, and Zhou Jie. 2020. Attention cube network for image restoration. In Proceedings of the 28th ACM International Conference on Multimedia. 25622570.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Pecognition. 71327141.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Huang Zhijie, Guo Xiaopeng, Shang Mingyu, Gao Jie, and Sun Jun. 2021. An efficient QP variable convolutional neural network based in-loop filter for intra coding. In 2021 Data Compression Conference (DCC’21). 3342. Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Huang Zhijie, Li Yunchang, and Sun Jun. 2020. Multi-gradient convolutional neural network based in-loop filter for VVC. In 2020 IEEE International Conference on Multimedia and Expo (ICME’20). 16. Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Huang Zhijie, Sun Jun, Guo Xiaopeng, and Shang Mingyu. 2021. One-for-all: An efficient variable convolution neural network for in-loop filter of VVC. IEEE Transactions on Circuits and Systems for Video Technology (2021), 11. Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Jia Chuanmin, Wang Shiqi, Zhang Xinfeng, Wang Shanshe, Liu Jiaying, Pu Shiliang, and Ma Siwei. 2019. Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing PP (012019), 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Karras Tero, Laine Samuli, and Aila Timo. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 44014410.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  24. [24] Lai Wei-Sheng, Huang Jia-Bin, Ahuja Narendra, and Yang Ming-Hsuan. 2017. Deep Laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 624632.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Lemmetti Ari, Viitanen Marko, Mercat Alexandre, and Vanne Jarno. 2020. Kvazaar 2.0: Fast and efficient open-source HEVC inter encoder. In Proceedings of the 11th ACM Multimedia Systems Conference. .Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Li Yue, Zhang Li, and Zhang Kai. 2022. iDAM: Iteratively trained deep in-loop filter with adaptive model selection. ACM Transactions on Multimedia Computing, Communications, and Applications (2022). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Liu Ding, Wang Zhaowen, Fan Yuchen, Liu Xianming, Wang Zhangyang, Chang Shiyu, and Huang Thomas. 2017. Robust video super-resolution with learned temporal dynamics. 25262534. Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Lu Guo, Zhang Xiaoyun, Ouyang Wanli, Xu Dong, Chen Li, and Gao Zhiyong. 2019. Deep non-local Kalman network for video compression artifact reduction. IEEE Transactions on Image Processing PP (092019), 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Nasiri Fatemeh, Hamidouche Wassim, Morin Luce, Dhollande Nicolas, and Cocherel Gildas. 2021. A CNN-based prediction-aware quality enhancement framework for VVC. IEEE Open Journal of Signal Processing 2 (2021), 466483.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Norkin Andrey, Bjontegaard Gisle, Fuldseth Arild, Narroschke Matthias, Ikeda Masaru, Andersson Kenneth, Zhou Minhua, and Auwera Geert Van der. 2012. HEVC deblocking filter. IEEE Transactions on Circuits and Systems for Video Technology 22 (122012), 17461754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Park Taesung, Liu Ming-Yu, Wang Ting-Chun, and Zhu Jun-Yan. 2019. Semantic image synthesis with spatially-adaptive normalization. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 23322341. Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (2019), 80268037.Google ScholarGoogle Scholar
  33. [33] Qin Zequn, Zhang Pengyi, Wu Fei, and Li Xi. 2021. FcaNet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 783792.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Rassool Reza. 2017. VMAF reproducibility: Validating a perceptual practical video quality metric. In 2017 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB’17). IEEE, 12.Google ScholarGoogle Scholar
  35. [35] Ronneberger Olaf, Fischer Philipp, and Brox Thomas. 2015. U-Net: Convolutional networks for biomedical image segmentation. LNCS 9351, 234241. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Shi Wenzhe, Caballero Jose, Huszár Ferenc, Totz Johannes, Aitken Andrew, Bishop Rob, Rueckert Daniel, and Wang Zehan. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Sullivan Gary, Ohm Jens-Rainer, and Wiegand Thomas. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22 (122012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Tian Yapeng, Zhang Yulun, Fu Yun, and Xu Chenliang. 2020. TDAN: Temporally-deformable alignment network for video super-resolution. 33573366. Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Wallace Gregory K.. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Wang Xintao, Chan Kelvin C. K., Yu Ke, Dong Chao, and Loy Chen Change. 2019. EDVR: Video restoration with enhanced deformable convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19) Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Wang Xintao, Yu Ke, Dong Chao, and Loy Chen Change. 2018. Recovering realistic texture in image super-resolution by deep spatial feature transform. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 606615. Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Wang Yingqian, Yang Jungang, Wang Longguang, Ying Xinyi, Wu Tianhao, An Wei, and Guo Yulan. 2021. Light field image super-resolution using deformable convolution. IEEE Transactions on Image Processing 30 (2021), 10571071. Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Wang Zhou, Bovik Alan, Sheikh H. R., and Simoncelli Eero. 2014. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13 (012014), 600612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wiegand Thomas, Sullivan Gary, Bjøntegaard Gisle, and Luthra Ajay. 2003. Overview of the H.264/AVC Video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13 (082003), 560576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Xu Yi, Gao Longwen, Tian Kai, Zhou Shuigeng, and Sun Huyang. 2019. Non-local ConvLSTM for video compression artifact reduction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 70437052.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Xue Tianfan, Chen Baian, Wu Jiajun, Wei Donglai, and Freeman William T.. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 11061125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Yang Ren, Xu Mai, and Wang Zulin. 2017. Decoder-side HEVC quality enhancement with scalable convolutional neural network. In 2017 IEEE International Conference on Multimedia and Expo (ICME’17). IEEE, 817822.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Yang Ren, Xu Mai, Wang Zulin, and Li Tianyi. 2018. Multi-frame quality enhancement for compressed video. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 66646673. Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Yuen Michael and Wu H. R.. 1998. A survey of hybrid MC/DPCM/DCT video coding distortions. Signal Processing 70, 3 (Nov.1998), 247278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Zhang Kai, Zuo Wangmeng, Chen Yunjin, Meng Deyu, and Zhang Lei. 2017. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing 26, 7 (2017), 31423155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Zhang Kai, Zuo Wangmeng, and Zhang Lei. 2018. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Transactions on Image Processing 27, 9 (2018), 46084622. Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Zhang Yulun, Li Kunpeng, Li Kai, Wang Lichen, Zhong Bineng, and Fu Yun. 2018. Image Super-Resolution Using Very Deep Residual Channel Attention Networks: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII. 294310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Zhang Yulun, Li Kunpeng, Li Kai, Zhong Bineng, and Fu Yun. 2019. Residual non-local attention networks for image restoration. In 7th International Conference on Learning Representations (ICLR’19), New Orleans, LA, May 6–9, 2019. OpenReview.net. https://openreview.net/forum?id=HkeGhoA5FX.Google ScholarGoogle Scholar

Index Terms

  1. FastCNN: Towards Fast and Accurate Spatiotemporal Network for HEVC Compressed Video Enhancement

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 3
        May 2023
        514 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3582886
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 February 2023
        • Online AM: 27 October 2022
        • Accepted: 22 October 2022
        • Revised: 19 September 2022
        • Received: 14 May 2022
        Published in tomm Volume 19, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)251
        • Downloads (Last 6 weeks)12

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!