Abstract
Modern codecs remove temporal redundancy of a video via inter prediction, i.e., searching previously coded frames for similar blocks and storing motion vectors to save bit-rates. However, existing codecs adopt block-level motion estimation, where a block is regressed by reference blocks linearly and is doomed to fail to deal with non-linear motions. In this article, we generate virtual reference frames (VRFs) with previously reconstructed frames via deep networks to offer an additional candidate, which is not constrained to linear motion structure and further significantly improves coding efficiency. More specifically, we propose a novel deep Auto-Regressive Moving-Average (ARMA) model, Error-Corrected Auto-Regressive Network (ECAR-Net), equipped with the powers of the conventional statistic ARMA models and deep networks jointly for reference frame prediction. Similar to conventional ARMA models, the ECAR-Net consists of two stages: Auto-Regression (AR) stage and Error-Correction (EC) stage, where the first part predicts the signal at the current time-step based on previously reconstructed frames, while the second one compensates for the output of the AR stage to obtain finer details. Different from the statistic AR models only focusing on short-term temporal dependency, the AR model of our ECAR-Net is further injected with the long-term dynamics mechanism, where long temporal information is utilized to help predict motions more accurately. Furthermore, ECAR-Net works in a configuration-adaptive way, i.e., using different dynamics and error definitions for the Low Delay B and Random Access configurations, which helps improve the adaptivity and generality in diverse coding scenarios. With the well-designed network, our method surpasses HEVC on average 5.0% and 6.6% BD-rate saving for the luma component under the Low Delay B and Random Access configurations and also obtains on average 1.54% BD-rate saving over VVC. Furthermore, ECAR-Net works in a configuration-adaptive way, i.e., using different dynamics and error definitions for the Low Delay B and Random Access configurations, which helps improve the adaptivity and generality in diverse coding scenarios.
- [1] . 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560–576.Google Scholar
Digital Library
- [2] . 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668.Google Scholar
Digital Library
- [3] . 2020. Versatile video coding (draft 9). In Proceedings of the Document JVET-R2001.Google Scholar
- [4] . 2018. Dmcnn: Dual-domain multi-scale convolutional neural network for compression artifacts removal. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 390–394.Google Scholar
Cross Ref
- [5] . 2019. Image/video restoration via multiplanar autoregressive model and low-rank optimization. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 4 (2019), 1–23.Google Scholar
Digital Library
- [6] . 2016. Isophote-constrained autoregressive model with adaptive window extension for image interpolation. IEEE Transactions on Circuits and Systems for Video Technology 28, 5 (2016), 1071–1086.Google Scholar
Cross Ref
- [7] . 2014. Adaptive general scale interpolation based on weighted autoregressive models. IEEE Transactions on Circuits and Systems for Video Technology 25, 2 (2014), 200–211.Google Scholar
- [8] . 2018. Augmented coarse-to-fine video frame synthesis with semantic loss. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision. Springer, 439–452.Google Scholar
Cross Ref
- [9] . 2018. Enhanced intra prediction with recurrent neural network in video coding. In Proceedings of the Data Compression Conference.413–413.Google Scholar
Cross Ref
- [10] . 2019. Progressive spatial recurrent neural network for intra prediction. IEEE Transactions on Multimedia 21, 12 (2019), 3024–3037.Google Scholar
Digital Library
- [11] . 2018. Optimized spatial recurrent network for intra prediction in video coding. In Proceedings of the IEEE Visual Communications and Image Processing. IEEE, 1–4.Google Scholar
Cross Ref
- [12] . 2018. A group variational transformation neural network for fractional interpolation of video coding. In Proceedings of the Data Compression Conference.127–136.Google Scholar
Cross Ref
- [13] . 2018. One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Transactions on Image Processing 28, 5 (2018), 2140–2151.Google Scholar
Digital Library
- [14] . 2019. Partition tree guided progressive rethinking network for in-loop filtering of HEVC. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 2671–2675.Google Scholar
Cross Ref
- [15] . 2019. Enhanced motion-compensated video coding with deep virtual reference frame generation. IEEE Transactions on Image Processing 28, 10 (2019), 4832–4844.Google Scholar
Cross Ref
- [16] . 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the International Conference on Computer Vision. 261–270.Google Scholar
Cross Ref
- [17] . 2019. Deep frame prediction for video coding. IEEE Transactions on Circuits and Systems for Video Technology 30, 7 (2019), 1843–1855.Google Scholar
- [18] . 2020. Deep reference generation with multi-domain hierarchical constraints for inter prediction. IEEE Transactions on Multimedia 22, 10 (2020), 2497–2510.Google Scholar
Cross Ref
- [19] . 2019. HEVC inter coding using deep recurrent neural networks and artificial reference pictures. In Proceedings of the Picture Coding Symposium. 1–5.Google Scholar
Cross Ref
- [20] . 2017. Deep predictive coding networks for video prediction and unsupervised learning. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [21] . 2015. Deep generative image models using a laplacian pyramid of adversarial networks. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
- [22] . 2018. Generative adversarial network-based frame extrapolation for video coding. In Proceedings of the IEEE Visual Communication and Image Processing. IEEE, 1–4.Google Scholar
Cross Ref
- [23] . 2017. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 2462–2470.Google Scholar
Cross Ref
- [24] . 2018. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 8934–8943.Google Scholar
Cross Ref
- [25] . 2018. Recurrent spatial pyramid CNN for optical flow estimation. IEEE Transactions on Multimedia 20, 10 (2018), 2814–2823.Google Scholar
Cross Ref
- [26] . 2015. Phase-based frame interpolation for video. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 1410–1418.Google Scholar
Cross Ref
- [27] . 2017. Video frame interpolation via adaptive convolution. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 670–679.Google Scholar
- [28] . 2017. Video frame synthesis using deep voxel flow. In Proceedings of the International Conference on Computer Vision. 4463–4471.Google Scholar
Cross Ref
- [29] . 2018. Sdc-net: Video prediction using spatially-displaced convolution. In Proceedings of the European Conference on Computer Vision. 718–733.Google Scholar
Digital Library
- [30] . 2018. Fully connected network-based intra prediction for image coding. IEEE Transactions on Image Processing 27, 7 (2018), 3236–3247.Google Scholar
Cross Ref
- [31] . 2018. DMCNN: Dual-domain multi-scale convolutional neural network for compression artifacts removal. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 390–394.Google Scholar
Cross Ref
- [32] . 2018. Reference-guided deep super-resolution via manifold localized external compensation. IEEE Transactions on Circuits and Systems for Video Technology 29, 5 (2018), 1270–1283.Google Scholar
Digital Library
- [33] . 2017. Multi-modal/multi-scale convolutional neural network based in-loop filter design for next generation video codec. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 26–30.Google Scholar
Digital Library
- [34] . 2019. Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing 28, 7 (2019), 3343–3356.Google Scholar
Digital Library
- [35] . 2018. Neural network based inter prediction for HEVC. In Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, 1–6.Google Scholar
Cross Ref
- [36] . 2018. Enhanced ctu-level inter prediction with deep frame rate up-conversion for high efficiency video coding. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 206–210.Google Scholar
Cross Ref
- [37] . 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.Google Scholar
Digital Library
- [38] . 2020. Deep network-based frame extrapolation with reference frame alignment. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 1178–1192.Google Scholar
Cross Ref
- [39] . 2011. Similarity modulated block estimation for image interpolation. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 1177–1180.Google Scholar
Cross Ref
- [40] . 2016. Marlow: A joint multiplanar autoregressive and low-rank approach for image completion. In Proceedings of the European Conference on Computer Vision. 819–834.Google Scholar
Cross Ref
- [41] . 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Advances in Neural Information Processing Systems. 802–810.Google Scholar
- [42] 2017. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 4161–4170.Google Scholar
- [43] . 2020. Learning enriched features for real image restoration and enhancement. In Proceedings of the European Conference on Computer Vision. 492–511.Google Scholar
Digital Library
- [44] . 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.Google Scholar
Digital Library
- [45] . 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [46] . 2013. Common test conditions and software reference configurations. Technical Report JCTVC-L1100 (2013).Google Scholar
- [47] . 2001. Calculation of average PSNR differences between RD-curves. Technical Report VCEG-M33 (2001).Google Scholar
- [48] . 2020. Memory-augmented auto-regressive network for frame recurrent inter prediction. In Proceedings of the IEEE International Symposium on Circuits and Systems. IEEE, 1–5.Google Scholar
Cross Ref
- [49] . 2019. DVC: An end-to-end deep video compression framework. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 11006–11015.Google Scholar
Cross Ref
Index Terms
Deep Inter Prediction with Error-Corrected Auto-Regressive Network for Video Coding
Recommendations
Low-complexity inter-layer residual prediction for scalable video coding
In this work, an inter-layer residual prediction method that has low complexity and minimal syntax changes is proposed to improve the rate-distortion RD performance of scalable video coding. The proposed method employs a new inter-layer reference ...
A Fast Coding Unit Depth Decision Algorithm for HEVC Inter Prediction
FCST '15: Proceedings of the 2015 Ninth International Conference on Frontier of Computer Science and TechnologyHigh Efficient Video Coding (HEVC) adopts a quad-tree structure to partition the Coding Unit (CU). Each CU can be split into smaller CUs recursively. It achieves high coding efficiency. But it also dramatically increases the computational complexity. ...
Intra/inter algorithm for B frame processing in H.264/AVC encoder
ICCOM'07: Proceedings of the 11th Conference on 11th WSEAS International Conference on Communications - Volume 11The H.264/AVC video coding standard aims to enable significantly improved compression performance compared to all existing video coding standards. In order to achieve this, a robust rate-distortion optimization (RDO) technique is employed to select best ...






Comments