Abstract
Deep neural networks have achieved remarkable success in HEVC compressed video quality enhancement. However, most existing multiframe-based methods either deliver unsatisfactory results or consume a significant amount of resources to leverage temporal information of neighboring frames. For the sake of practicality, a thorough investigation of the architecture design of the video quality enhancement network regarding enhancement performance, model parameters, and running speed is essential. In this article, we first propose an efficient alignment module that can quickly and accurately aggregate the spatiotemporal information of neighboring frames. The proposed module estimates deformable offsets progressively in lower-resolution space motivated by the observation of offset correlations between adjacent pixels. Then, the quantization parameter (QP) that represents compression level prior knowledge is utilized to guide aligned feature enhancement. By combining alignment feature distillation with residual feature correction, we obtain an efficient QP attention block. To save the storage space of the network, we design a hash buffer to store QP embedding features. These efficient components allow our network to effectively exploit temporal redundancies and obtain favorable enhancement capability while maintaining a lightweight structure and fast running speed. Extensive experiments demonstrate that the proposed approach outperforms state-of-the-art methods over different QPs by up to 0.09 to 0.11 dB, whereas the inference time can be reduced by up to 69%.
- [1] . 2019. Real image denoising with feature attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3155–3164.Google Scholar
Cross Ref
- [2] . 2001. Document VCEG-M33: Calculation of average PSNR differences between RD-Curves. Proceedings of the ITU-T Video Coding Experts Group (VCEG’19) 13th Meeting (
01 2001).Google Scholar - [3] . 2011. Common test conditions and software reference configurations. In JCT-VC, Torino, Italy, Tech. Rep. JCTVC-L1100.Google Scholar
- [4] . 2021. Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3736–3764.Google Scholar
Cross Ref
- [5] . 2020. Understanding deformable alignment in video super-resolution. arXiv preprint arXiv:2009.07265 4 (2020), 3.Google Scholar
- [6] . 2020. Spatial-adaptive network for single image denoising. In European Conference on Computer Vision. Springer, 171–187.Google Scholar
Digital Library
- [7] . 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 764–773.Google Scholar
Cross Ref
- [8] . 2020. Spatio-temporal deformable convolution for compressed video quality enhancement. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10696–10703.Google Scholar
Cross Ref
- [9] . 2021. Patch-wise spatial-temporal quality enhancement for HEVC compressed video. IEEE Transactions on Image Processing 30 (2021), 6459–6472. Google Scholar
Cross Ref
- [10] . 2015. Compression artifacts reduction by a deep convolutional network. In 2015 IEEE International Conference on Computer Vision (ICCV’15). 576–584. Google Scholar
Digital Library
- [11] . 2012. Sample adaptive offset in the HEVC standard. IEEE Transactions on Circuits and Systems for Video Technology 22 (
12 2012). Google ScholarDigital Library
- [12] . 2017. Deep generative adversarial compression artifact removal. In 2017 IEEE International Conference on Computer Vision (ICCV’17). 4836–4845. Google Scholar
Cross Ref
- [13] . 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).Google Scholar
- [14] . 2019. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence PP (
10 2019), 1–1. Google ScholarCross Ref
- [15] . 2020. Attention cube network for image restoration. In Proceedings of the 28th ACM International Conference on Multimedia. 2562–2570.Google Scholar
Digital Library
- [16] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [17] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Pecognition. 7132–7141.Google Scholar
Cross Ref
- [18] . 2021. An efficient QP variable convolutional neural network based in-loop filter for intra coding. In 2021 Data Compression Conference (DCC’21). 33–42. Google Scholar
Cross Ref
- [19] . 2020. Multi-gradient convolutional neural network based in-loop filter for VVC. In 2020 IEEE International Conference on Multimedia and Expo (ICME’20). 1–6. Google Scholar
Cross Ref
- [20] . 2021. One-for-all: An efficient variable convolution neural network for in-loop filter of VVC. IEEE Transactions on Circuits and Systems for Video Technology (2021), 1–1. Google Scholar
Cross Ref
- [21] . 2019. Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing PP (
01 2019), 1–1. Google ScholarDigital Library
- [22] . 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.Google Scholar
Cross Ref
- [23] . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- [24] . 2017. Deep Laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 624–632.Google Scholar
Cross Ref
- [25] . 2020. Kvazaar 2.0: Fast and efficient open-source HEVC inter encoder. In Proceedings of the 11th ACM Multimedia Systems Conference. .Google Scholar
Digital Library
- [26] . 2022. iDAM: Iteratively trained deep in-loop filter with adaptive model selection. ACM Transactions on Multimedia Computing, Communications, and Applications (2022). Google Scholar
Digital Library
- [27] . 2017. Robust video super-resolution with learned temporal dynamics. 2526–2534. Google Scholar
Cross Ref
- [28] . 2019. Deep non-local Kalman network for video compression artifact reduction. IEEE Transactions on Image Processing PP (
09 2019), 1–1. Google ScholarDigital Library
- [29] . 2021. A CNN-based prediction-aware quality enhancement framework for VVC. IEEE Open Journal of Signal Processing 2 (2021), 466–483.Google Scholar
Cross Ref
- [30] . 2012. HEVC deblocking filter. IEEE Transactions on Circuits and Systems for Video Technology 22 (
12 2012), 1746–1754. Google ScholarDigital Library
- [31] . 2019. Semantic image synthesis with spatially-adaptive normalization. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 2332–2341. Google Scholar
Cross Ref
- [32] . 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (2019), 8026–8037.Google Scholar
- [33] . 2021. FcaNet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 783–792.Google Scholar
Cross Ref
- [34] . 2017. VMAF reproducibility: Validating a perceptual practical video quality metric. In 2017 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB’17). IEEE, 1–2.Google Scholar
- [35] . 2015. U-Net: Convolutional networks for biomedical image segmentation. LNCS 9351, 234–241. Google Scholar
Cross Ref
- [36] . 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Google Scholar
Cross Ref
- [37] . 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22 (
12 2012). Google ScholarDigital Library
- [38] . 2020. TDAN: Temporally-deformable alignment network for video super-resolution. 3357–3366. Google Scholar
Cross Ref
- [39] . 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv.Google Scholar
Digital Library
- [40] . 2019. EDVR: Video restoration with enhanced deformable convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19) Workshops.Google Scholar
Cross Ref
- [41] . 2018. Recovering realistic texture in image super-resolution by deep spatial feature transform. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 606–615. Google Scholar
Cross Ref
- [42] . 2021. Light field image super-resolution using deformable convolution. IEEE Transactions on Image Processing 30 (2021), 1057–1071. Google Scholar
Cross Ref
- [43] . 2014. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13 (
01 2014), 600–612.Google ScholarDigital Library
- [44] . 2003. Overview of the H.264/AVC Video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13 (
08 2003), 560–576. Google ScholarDigital Library
- [45] . 2019. Non-local ConvLSTM for video compression artifact reduction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7043–7052.Google Scholar
Cross Ref
- [46] . 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.Google Scholar
Digital Library
- [47] . 2017. Decoder-side HEVC quality enhancement with scalable convolutional neural network. In 2017 IEEE International Conference on Multimedia and Expo (ICME’17). IEEE, 817–822.Google Scholar
Cross Ref
- [48] . 2018. Multi-frame quality enhancement for compressed video. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6664–6673. Google Scholar
Cross Ref
- [49] . 1998. A survey of hybrid MC/DPCM/DCT video coding distortions. Signal Processing 70, 3 (
Nov. 1998), 247–278. Google ScholarDigital Library
- [50] . 2017. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing 26, 7 (2017), 3142–3155. Google Scholar
Digital Library
- [51] . 2018. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Transactions on Image Processing 27, 9 (2018), 4608–4622. Google Scholar
Cross Ref
- [52] . 2018. Image Super-Resolution Using Very Deep Residual Channel Attention Networks: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII. 294–310. Google Scholar
Digital Library
- [53] . 2019. Residual non-local attention networks for image restoration. In 7th International Conference on Learning Representations (ICLR’19), New Orleans, LA, May 6–9, 2019. OpenReview.net. https://openreview.net/forum?id=HkeGhoA5FX.Google Scholar
Index Terms
FastCNN: Towards Fast and Accurate Spatiotemporal Network for HEVC Compressed Video Enhancement
Recommendations
CU encoding depth prediction, early CU splitting termination and fast mode decision for fast HEVC intra-coding
High Efficiency Video Coding (HEVC) is a new video coding standard achieving about a 50% bit rate reduction compared to the popular H.264/AVC High Profile with the same subjective reproduced video quality. Better coding efficiency is attained, however, ...
Fast video transcoding from HEVC to VP9
HEVC and VP9 are the current state-of-the-art in video compression, since their bit-streams were recently finalized in January and May 2013, respectively. These codecs are the generational successors of the currently most widely-used video codecs, H.264/...
Ultra Fast H.264/AVC to HEVC Transcoder
DCC '13: Proceedings of the 2013 Data Compression ConferenceThe emerging High Efficiency Video Coding (HEVC) standard achieves significant performance improvement over H.264/AVC standard at a cost of much higher complexity. In this paper, we propose a ultra fast H.264/AVC to HEVC transcoder for multi-core ...






Comments