skip to main content
research-article

iDAM: Iteratively Trained Deep In-loop Filter with Adaptive Model Selection

Published:23 January 2023Publication History
Skip Abstract Section

Abstract

As a rapid development of neural-network-based machine learning algorithms, deep learning methods are being tentatively used in a much wider range than well-known artificial intelligence applications such as face recognition or auto-driving. Recently, deep learning models are investigated intensively to improve the compression efficiency for video coding, especially at the in-loop filtering stage. Although deep learning-based in-loop filtering methods in prior arts have already shown a remarkable potential capability in video coding, content propagation issue is still not well recognized and addressed yet. Content propagation is the fact that contents of reference frames are propagated to frames referring to them, which typically leads to over-filtering issues. In this article, we develop an iteratively trained deep in-loop filter with adaptive model selection (iDAM) to address the content propagation issue. First, we propose an iterative training scheme, which enables the network to gradually take into account the impacts of content propagation. Second, we propose a filter selection mechanism, i.e., allowing a block to select from a set of candidate filters with different filtering strengths. Besides, we propose a novel approach to design a conditional in-loop filtering method that can deal with multiple quality levels with a single model and serve the functionality of filter selection by modifying the input parameters. Extensive experiments on top of the latest video coding standard (Versatile Video Coding, VVC) have been conducted to evaluate the proposed techniques. Compared with VTM-11.0, our scheme achieves a new state-of-the-art, leading to {7.91%, 20.25%, 20.44%}, {11.64%, 26.40%, 26.50%}, and {10.97%, 26.63%, 26.77%} BD-rate reductions on average for {Y, Cb, Cr} under all-intra, random-access, and low-delay configurations, respectively. As far as we know, our proposed iDAM scheme provides the highest coding performance compared to all existing solutions. In addition, the syntax elements of the proposed scheme were adopted at the 76th meeting of Audio Video coding Standard (AVS) held this year.

REFERENCES

  1. [1] Bjontegaard Gisle. 2001. Calcuation of Average PSNR Differences Between RD-curves. Technical Report VCEG-M33. VCEG.Google ScholarGoogle Scholar
  2. [2] Bordes Philippe, Galpin Franck, Dumas Thierry, and Nikitin Pavel. 2021. AHG11: Replacing SAO in-loop filter with neural networks. JVET-V0092 (Apr.2021).Google ScholarGoogle Scholar
  3. [3] Bossen Frank, Boyce Jill, Li Xiang, Seregin Vadim, and Sühring Karsten. 2018. JVET common test conditions and software reference configurations for SDR video. JVET-K1010 (Sep.2018).Google ScholarGoogle Scholar
  4. [4] Bross Benjamin, Chen Jianle, Liu Shan, and Wang Ye-Kui. 2020. Versatile video coding (draft 10). JVET-S2001 (Sep.2020).Google ScholarGoogle Scholar
  5. [5] Chen Wei, Xiu Xiaoyu, Chen Yi-Wen, Jhu Hong-Jheng, Kuo Che-Wei, and Wang Xianglin. 2021. EE-2.1.5: In-loop filtering based on neural network. JVET-U0101 (Jan.2021).Google ScholarGoogle Scholar
  6. [6] Chiu Yi-Jen and Xu L. 2008. Adaptive (Wiener) filter for video compression. ITU-T SG16 Contribution C 437 (2008).Google ScholarGoogle Scholar
  7. [7] Dai Yuanying, Liu Dong, and Wu Feng. 2017. A convolutional neural network approach for post-processing in HEVC intra coding. In Proceedings of the International Conference on Multimedia Modeling. Springer, 2839.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Dong Chao, Deng Yubin, Loy Chen Change, and Tang Xiaoou. 2015. Compression artifacts reduction by a deep convolutional network. In IEEE International Conference on Computer Vision. 576584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Fu Chih-Ming, Alshina Elena, Alshin Alexander, Huang Yu-Wen, Chen Ching-Yeh, Tsai Chia-Yang, Hsu Chih-Wei, Lei Shaw-Min, Park Jeong-Hoon, and Han Woo-Jin. 2012. Sample adaptive offset in the HEVC standard. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 17551764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Guan Zhenyu, Xing Qunliang, Xu Mai, Yang Ren, Liu Tie, and Wang Zulin. 2019. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Trans. Pattern Anal. Mach. Intell. (2019).Google ScholarGoogle Scholar
  11. [11] Howard Andrew G., Zhu Menglong, Chen Bo, Kalenichenko Dmitry, Wang Weijun, Weyand Tobias, Andreetto Marco, and Adam Hartwig. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  12. [12] Jia Chuanmin, Wang Shiqi, Zhang Xinfeng, Wang Shanshe, Liu Jiaying, Pu Shiliang, and Ma Siwei. 2019. Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Trans. Image Process. 28, 7 (2019), 33433356.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  14. [14] Li Jiahao, Li Bin, Xu Jizheng, Xiong Ruiqin, and Gao Wen. 2018. Fully connected network-based intra prediction for image coding. IEEE Trans. Image Process. 27, 7 (2018), 32363247.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Li Junru, Wang Meng, Li Yue, Zhang Li, Zhang Kai, Ma Siwei, Wang Shiqi, Zheng Xiaozhen, Zhou Yan, and Liu Dong. 2021. Hook for AI coding. AVS-M6363 (Apr.2021).Google ScholarGoogle Scholar
  16. [16] Li Yue, Liu Dong, Li Houqiang, Li Li, Wu Feng, Zhang Hong, and Yang Haitao. 2017. Convolutional neural network-based block up-sampling for intra frame coding. IEEE Trans. Circ. Syst. Vid. Technol. 28, 9 (2017), 23162330.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Li Yue, Zhang Li, and Zhang Kai. 2021. AHG11: Convolutional neural network-based in-loop filter with adaptive model selection. JVET-U0068 (Jan.2021).Google ScholarGoogle Scholar
  18. [18] Li Yue, Zhang Li, and Zhang Kai. 2021. AHG11: Deep in-loop filter with adaptive model selection. JVET-V0100 (Apr.2021).Google ScholarGoogle Scholar
  19. [19] Lin Jianping, Liu Dong, Yang Haitao, Li Houqiang, and Wu Feng. 2018. Convolutional neural network-based block up-sampling for HEVC. IEEE Trans. Circ. Syst. Vid. Technol. 29, 12 (2018), 37013715.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] List Peter, Joch Anthony, Lainema Jani, Bjontegaard Gisle, and Karczewicz Marta. 2003. Adaptive deblocking filter. IEEE Trans. Circ. Syst. Vid. Technol. 13, 7 (2003), 614619.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Liu Dong, Li Yue, Lin Jianping, Li Houqiang, and Wu Feng. 2020. Deep learning-based video coding: A review and a case study. ACM Comput. Surv. 53, 1 (2020), 135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Liu Jiaying, Liu Dong, Yang Wenhan, Xia Sifeng, Zhang Xiaoshuai, and Dai Yuanying. 2020. A comprehensive benchmark for single image compression artifact reduction. IEEE Trans. Image Process. 29 (2020), 78457860.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Ma Di, Zhang Fan, and Bull David. 2020. MFRNet: A new CNN architecture for post-processing and in-loop filtering. IEEE J. Select. Topics. Sig. Process. (2020).Google ScholarGoogle Scholar
  24. [24] Ma Di, Zhang Fan, and Bull David R.. 2020. BVI-DVC: A training database for deep video compression. arXiv preprint arXiv:2003.13552 (2020).Google ScholarGoogle Scholar
  25. [25] Ma Haichuan, Liu Dong, and Wu Feng. 2020. Improving compression artifact reduction via end-to-end learning of side information. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing (VCIP). IEEE, 403406.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Ma Siwei, Zhang Xinfeng, Jia Chuanmin, Zhao Zhenghui, Wang Shiqi, and Wang Shanshe. 2019. Image and video compression with neural networks: A review. IEEE Trans. Circ. Syst. Vid. Technol. 30, 6 (2019), 16831698.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Meng Xiandong, Chen Chen, Zhu Shuyuan, and Zeng Bing. 2018. A new HEVC in-loop filter based on multi-channel long-short-term dependency residual networks. In Proceedings of the Data Compression Conference. IEEE, 187196.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Misra Kiran, Bossen Frank, and Segall Andrew. 2019. On cross component adaptive loop filter for video compression. In Proceedings of the Picture Coding Symposium (PCS). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Nasiri Fatemeh, Hamidouche Wassim, Morin Luce, Dhollande Nicolas, and Cocherel Gildas. 2021. A CNN-based prediction-aware quality enhancement framework for VVC. IEEE Open J. Sig. Process. 2 (2021), 466483.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Norkin Andrey, Bjontegaard Gisle, Fuldseth Arild, Narroschke Matthias, Ikeda Masaru, Andersson Kenneth, Zhou Minhua, and Auwera Geert Van der. 2012. HEVC deblocking filter. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 17461754.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Park Woon-Sung and Kim Munchurl. 2016. CNN-based in-loop filtering for coding efficiency improvement. In Proceedings of the IVMSP. IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 80268037.Google ScholarGoogle Scholar
  33. [33] Pennebaker William B. and Mitchell Joan L.. 1992. JPEG: Still Image Data Compression Standard. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Rassool Reza. 2017. VMAF reproducibility: Validating a perceptual practical video quality metric. In Proceedings of the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 12.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Schwarz Heiko, Marpe Detlev, and Wiegand Thomas. 2006. Analysis of hierarchical b pictures and MCTF. In Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, 19291932.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Shi Wenzhe, Caballero Jose, Huszár Ferenc, Totz Johannes, Aitken Andrew P., Bishop Rob, Rueckert Daniel, and Wang Zehan. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 18741883.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Sullivan Gary J., Ohm Jens, Han Woo-Jin, and Wiegand Thomas. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 16491668.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Tan Mingxing and Le Quoc. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, 61056114.Google ScholarGoogle Scholar
  39. [39] Timofte Radu, Agustsson Eirikur, Gool Luc Van, et al. 2017. NTIRE 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the CVPR Workshops. IEEE, 11101121.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Tung Frederick and Mori Greg. 2018. Deep neural network compression by in-parallel pruning-quantization. IEEE Trans. Pattern Anal. Mach. Intell. 42, 3 (2018), 568579.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Wang Dezhao, Xia Sifeng, Yang Wenhan, and Liu Jiaying. 2021. Combining progressive rethinking and collaborative learning: A deep framework for in-loop filtering. IEEE Trans. Image Process. 30 (2021), 41984211.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Wang Ming-Ze, Wan Shuai, Gong Hao, and Ma Ming-Yang. 2019. Attention-based dual-scale CNN in-loop filter for versatile Video Coding. IEEE Access 7 (2019), 145214145226.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Wang Zhao, Ma Changyue, and Ye Yan. 2021. Multi-density attention network for loop filtering in video compression. arXiv preprint arXiv:2104.12865 (2021).Google ScholarGoogle Scholar
  44. [44] Wang Zhou, Simoncelli Eero P., and Bovik Alan C.. 2003. Multiscale structural similarity for image quality assessment. In Proceedings of the 37th Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2. IEEE, 13981402.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Woo Sanghyun, Park Jongchan, Lee Joon-Young, and Kweon In So. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV). 319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Wu Jiaxiang, Leng Cong, Wang Yuhang, Hu Qinghao, and Cheng Jian. 2016. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 48204828.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Yan Ning, Liu Dong, Li Houqiang, Li Bin, Li Li, and Wu Feng. 2018. Convolutional neural network-based fractional-pixel motion compensation. IEEE Trans. Circ. Syst. Vid. Technol. 29, 3 (2018), 840853.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Yang Ren, Xu Mai, Liu Tie, Wang Zulin, and Guan Zhenyu. 2018. Enhancing quality for HEVC compressed videos. IEEE Trans. Circ. Syst. Vid. Technol. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Zhang Fan, Feng Chen, and Bull David R.. 2020. Enhancing VVC through CNN-based post-processing. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Zhang Kai, Chen Jianle, Zhang Li, Li Xiang, and Karczewicz Marta. 2018. Enhanced cross-component linear model for chroma intra-prediction in video coding. IEEE Trans. Image Process. 27, 8 (2018), 39833997.Google ScholarGoogle Scholar
  51. [51] Zhang Linfeng, Bao Chenglong, and Ma Kaisheng. 2021. Self-distillation: Towards efficient and compact neural networks. IEEE Trans. Pattern Anal. Mach. Intell. (2021).Google ScholarGoogle Scholar
  52. [52] Zhang Yongbing, Sun Lulu, Yan Chenggang, Ji Xiangyang, and Dai Qionghai. 2018. Adaptive residual networks for high-quality image restoration. IEEE Trans. Image Process. 27, 7 (2018), 31503163.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zhao Lei, Wang Shiqi, Zhang Xinfeng, Wang Shanshe, Ma Siwei, and Gao Wen. 2019. Enhanced motion-compensated video coding with deep virtual reference frame generation. IEEE Trans. Image Process. 28, 10 (2019), 48324844.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. iDAM: Iteratively Trained Deep In-loop Filter with Adaptive Model Selection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1s
        February 2023
        504 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3572859
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 January 2023
        • Online AM: 1 April 2022
        • Accepted: 24 March 2022
        • Revised: 18 February 2022
        • Received: 18 November 2021
        Published in tomm Volume 19, Issue 1s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!