skip to main content
research-article

Deep Inter Prediction with Error-Corrected Auto-Regressive Network for Video Coding

Authors Info & Claims
Published:23 January 2023Publication History
Skip Abstract Section

Abstract

Modern codecs remove temporal redundancy of a video via inter prediction, i.e., searching previously coded frames for similar blocks and storing motion vectors to save bit-rates. However, existing codecs adopt block-level motion estimation, where a block is regressed by reference blocks linearly and is doomed to fail to deal with non-linear motions. In this article, we generate virtual reference frames (VRFs) with previously reconstructed frames via deep networks to offer an additional candidate, which is not constrained to linear motion structure and further significantly improves coding efficiency. More specifically, we propose a novel deep Auto-Regressive Moving-Average (ARMA) model, Error-Corrected Auto-Regressive Network (ECAR-Net), equipped with the powers of the conventional statistic ARMA models and deep networks jointly for reference frame prediction. Similar to conventional ARMA models, the ECAR-Net consists of two stages: Auto-Regression (AR) stage and Error-Correction (EC) stage, where the first part predicts the signal at the current time-step based on previously reconstructed frames, while the second one compensates for the output of the AR stage to obtain finer details. Different from the statistic AR models only focusing on short-term temporal dependency, the AR model of our ECAR-Net is further injected with the long-term dynamics mechanism, where long temporal information is utilized to help predict motions more accurately. Furthermore, ECAR-Net works in a configuration-adaptive way, i.e., using different dynamics and error definitions for the Low Delay B and Random Access configurations, which helps improve the adaptivity and generality in diverse coding scenarios. With the well-designed network, our method surpasses HEVC on average 5.0% and 6.6% BD-rate saving for the luma component under the Low Delay B and Random Access configurations and also obtains on average 1.54% BD-rate saving over VVC. Furthermore, ECAR-Net works in a configuration-adaptive way, i.e., using different dynamics and error definitions for the Low Delay B and Random Access configurations, which helps improve the adaptivity and generality in diverse coding scenarios.

REFERENCES

  1. [1] Wiegand T., Sullivan G. J., Bjontegaard G., and Luthra A.. 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Sullivan G. J., Ohm J., Han W., and Wiegand T.. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 16491668.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Bross Benjamin, Chen Jianle, Liu Shan, and Wang Ye-Kui. 2020. Versatile video coding (draft 9). In Proceedings of the Document JVET-R2001.Google ScholarGoogle Scholar
  4. [4] Zhang X., Yang W., Hu Y., and Liu J.. 2018. Dmcnn: Dual-domain multi-scale convolutional neural network for compression artifacts removal. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 390394.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Li Mading, Liu Jiaying, Sun Xiaoyan, and Xiong Zhiwei. 2019. Image/video restoration via multiplanar autoregressive model and low-rank optimization. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 4 (2019), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Yang Wenhan, Liu Jiaying, Li Mading, and Guo Zongming. 2016. Isophote-constrained autoregressive model with adaptive window extension for image interpolation. IEEE Transactions on Circuits and Systems for Video Technology 28, 5 (2016), 10711086.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Li Mading, Liu Jiaying, Ren Jie, and Guo Zongming. 2014. Adaptive general scale interpolation based on weighted autoregressive models. IEEE Transactions on Circuits and Systems for Video Technology 25, 2 (2014), 200211.Google ScholarGoogle Scholar
  8. [8] Jin Xin, Chen Zhibo, Liu Sen, and Zhou Wei. 2018. Augmented coarse-to-fine video frame synthesis with semantic loss. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision. Springer, 439452.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Hu Yueyu, Yang Wenhan, Xia Sifeng, Cheng Wen-Huang, and Liu Jiaying. 2018. Enhanced intra prediction with recurrent neural network in video coding. In Proceedings of the Data Compression Conference.413413.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Hu Y., Yang W., Li M., and Liu J.. 2019. Progressive spatial recurrent neural network for intra prediction. IEEE Transactions on Multimedia 21, 12 (2019), 30243037.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Hu Y., Yang W., Xia S., and Liu J.. 2018. Optimized spatial recurrent network for intra prediction in video coding. In Proceedings of the IEEE Visual Communications and Image Processing. IEEE, 14.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Xia Sifeng, Yang Wenhan, Hu Yueyu, Ma Siwei, and Liu Jiaying. 2018. A group variational transformation neural network for fractional interpolation of video coding. In Proceedings of the Data Compression Conference.127136.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Liu Jiaying, Xia Sifeng, Yang Wenhan, Li Mading, and Liu Dong. 2018. One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Transactions on Image Processing 28, 5 (2018), 21402151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Wang Dezhao, Xia Sifeng, Yang Wenhan, Hu Yueyu, and Liu Jiaying. 2019. Partition tree guided progressive rethinking network for in-loop filtering of HEVC. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 26712675.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Zhao L., Wang S., Zhang X., Wang S., and Ma S.. 2019. Enhanced motion-compensated video coding with deep virtual reference frame generation. IEEE Transactions on Image Processing 28, 10 (2019), 48324844.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Niklaus S., Mai L., and Liu F.. 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the International Conference on Computer Vision. 261270.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Choi Hyomin and Bajić Ivan V.. 2019. Deep frame prediction for video coding. IEEE Transactions on Circuits and Systems for Video Technology 30, 7 (2019), 18431855.Google ScholarGoogle Scholar
  18. [18] Liu J., Xia S., and Yang W.. 2020. Deep reference generation with multi-domain hierarchical constraints for inter prediction. IEEE Transactions on Multimedia 22, 10 (2020), 24972510.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Laude T., Haub F., and Ostermann J.. 2019. HEVC inter coding using deep recurrent neural networks and artificial reference pictures. In Proceedings of the Picture Coding Symposium. 15.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Lotter William, Kreiman Gabriel, and Cox David. 2017. Deep predictive coding networks for video prediction and unsupervised learning. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  21. [21] Denton E. L., Chintala S., Szlam A., and Fergus R.. 2015. Deep generative image models using a laplacian pyramid of adversarial networks. In Proceedings of the Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  22. [22] Lin J., Liu D., Li H., and Wu F.. 2018. Generative adversarial network-based frame extrapolation for video coding. In Proceedings of the IEEE Visual Communication and Image Processing. IEEE, 14.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Ilg E., Mayer N., Saikia T., Keuper M., Dosovitskiy A., and Brox T.. 2017. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 24622470.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Sun D., Yang X., Liu M., and Kautz J.. 2018. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 89348943.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Hu Ping, Wang Gang, and Tan Yap-Peng. 2018. Recurrent spatial pyramid CNN for optical flow estimation. IEEE Transactions on Multimedia 20, 10 (2018), 28142823.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Meyer S., Wang O., Zimmer H., Grosse M., and Sorkine-Hornung A.. 2015. Phase-based frame interpolation for video. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 14101418.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Niklaus S., Mai L., and Liu F.. 2017. Video frame interpolation via adaptive convolution. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 670679.Google ScholarGoogle Scholar
  28. [28] Liu Z., Yeh R. A., Tang X., Liu Y., and Agarwala A.. 2017. Video frame synthesis using deep voxel flow. In Proceedings of the International Conference on Computer Vision. 44634471.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Reda Fitsum A., Liu Guilin, Shih Kevin J., Kirby Robert, Barker Jon, Tarjan David, Tao Andrew, and Catanzaro Bryan. 2018. Sdc-net: Video prediction using spatially-displaced convolution. In Proceedings of the European Conference on Computer Vision. 718733.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Li J., Li B., Xu J., Xiong R., and Gao W.. 2018. Fully connected network-based intra prediction for image coding. IEEE Transactions on Image Processing 27, 7 (2018), 32363247.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Zhang X., Yang W., Hu Y., and Liu J.. 2018. DMCNN: Dual-domain multi-scale convolutional neural network for compression artifacts removal. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 390394.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Yang W., Xia S., Liu J., and Guo Z.. 2018. Reference-guided deep super-resolution via manifold localized external compensation. IEEE Transactions on Circuits and Systems for Video Technology 29, 5 (2018), 12701283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Kang J., Kim S., and Lee K. M.. 2017. Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 2630.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Jia C., Wang S., Zhang X., Wang S., Liu J., Pu S., and Ma S.. 2019. Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing 28, 7 (2019), 33433356.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Wang Y., Fan X., Jia C., Zhao D., and Gao W.. 2018. Neural network based inter prediction for HEVC. In Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Zhao L., Wang S., Zhang X., Wang S., Ma S., and Gao W.. 2018. Enhanced ctu-level inter prediction with deep frame rate up-conversion for high efficiency video coding. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 206210.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Huo Shuai, Liu Dong, Li Bin, Ma Siwei, Wu Feng, and Gao Wen. 2020. Deep network-based frame extrapolation with reference frame alignment. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 11781192.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Ren Jie, Liu Jiaying, Bai Wei, and Guo Zongming. 2011. Similarity modulated block estimation for image interpolation. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 11771180.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Li Mading, Liu Jiaying, Xiong Zhiwei, Sun Xiaoyan, and Guo Zongming. 2016. Marlow: A joint multiplanar autoregressive and low-rank approach for image completion. In Proceedings of the European Conference on Computer Vision. 819834.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Shi X., Chen Z., Wang H., Yeung D., Wong W., and Woo W.. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Advances in Neural Information Processing Systems. 802810.Google ScholarGoogle Scholar
  42. [42] Anurag Ranjan and J. Black Michael2017. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 41614170.Google ScholarGoogle Scholar
  43. [43] Zamir Syed Waqas, Arora Aditya, Khan Salman, Hayat Munawar, Khan Fahad Shahbaz, Yang Ming-Hsuan, and Shao Ling. 2020. Learning enriched features for real image restoration and enhancement. In Proceedings of the European Conference on Computer Vision. 492511.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Xue Tianfan, Chen Baian, Wu Jiajun, Wei Donglai, and Freeman William T.. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 11061125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Kingma Diederik and Ba Jimmy. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  46. [46] Bossen Frank. 2013. Common test conditions and software reference configurations. Technical Report JCTVC-L1100 (2013).Google ScholarGoogle Scholar
  47. [47] Bjontegaard Gisle. 2001. Calculation of average PSNR differences between RD-curves. Technical Report VCEG-M33 (2001).Google ScholarGoogle Scholar
  48. [48] Hu Y., Xia S., Yang W., and Liu J.. 2020. Memory-augmented auto-regressive network for frame recurrent inter prediction. In Proceedings of the IEEE International Symposium on Circuits and Systems. IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Lu Guo, Ouyang Wanli, Xu Dong, Zhang Xiaoyun, Cai Chunlei, and Gao Zhiyong. 2019. DVC: An end-to-end deep video compression framework. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 1100611015.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Deep Inter Prediction with Error-Corrected Auto-Regressive Network for Video Coding

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1s
      February 2023
      504 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572859
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 January 2023
      • Online AM: 1 April 2022
      • Accepted: 21 March 2022
      • Revised: 19 February 2022
      • Received: 25 October 2021
      Published in tomm Volume 19, Issue 1s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!