Abstract
Video frame interpolation (VFI) is of great importance for many video applications, yet it is still challenging even in the era of deep learning. Some existing VFI models directly exploit existing lightweight network frameworks, thus making synthesized in-between frames blurry and creating artifacts due to imprecise motion representation. The other existing VFI models typically depend on heavy model architectures with a large number of parameters, preventing them from being deployed on small terminals. To address these issues, we propose a local lightweight VFI network (L2BEC2) that leverages bidirectional encoding structure with channel attention cascade. Specifically, we improve visual quality by introducing a forward and backward encoding structure with channel attention cascade to better characterize motion information. Furthermore, we introduce a local lightweight strategy into the state-of-the-art Adaptive Collaboration of Flows (AdaCoF) model to simplify its model parameters. Compared with the original AdaCoF model, the proposed L2BEC2 obtains performance gain at the cost of only one-third of the number of parameters and performs favorably against the state-of-the-art works on public datasets. Our source code is available at https://github.com/Pumpkin123709/LBEC.git.
- [1] . 2020. Spatio-temporal saliency-based motion vector refinement for frame rate up-conversion. ACM Trans. Multimedia Comput. Commun. Appl. 16, 2, Article
55 (May 2020), 18 pages. Google ScholarDigital Library
- [2] . 2020. AdaCoF: Adaptive collaboration of flows for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 5315–5324. Google Scholar
Cross Ref
- [3] . 2018. Super SloMo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9000–9008. Google Scholar
Cross Ref
- [4] . 2019. Depth-aware video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 3698–3707. Google Scholar
Cross Ref
- [5] . 2016. Deep stereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 5515–5524. Google Scholar
Cross Ref
- [6] . 2019. PoSNet: 4x video frame interpolation using position-specific flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW’19). 3503–3511. Google Scholar
Cross Ref
- [7] . 2016. Learning image matching by simply watching video. In Proceedings of the European Conference on Computer Vision (ECCV’16), , , , and (Eds.). Springer International Publishing, Cham, 434–450.Google Scholar
Cross Ref
- [8] . 2020. Video frame interpolation via residue refinement. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). 2613–2617. Google Scholar
Cross Ref
- [9] . 2021. MEMC-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3 (2021), 933–948. Google Scholar
Cross Ref
- [10] . 2017. Video frame interpolation via adaptive convolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2270–2279. Google Scholar
Cross Ref
- [11] . 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 261–270. Google Scholar
Cross Ref
- [12] . 2020. Video frame interpolation via deformable separable convolution. Proceedings of the AAAI Conference on Artificial Intelligence 34, 7 (2020), 10607–10614.Google Scholar
- [13] . 2021. CDFI: Compression-driven network design for frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 7997–8007. Google Scholar
Cross Ref
- [14] . 2020. Channel attention is all you need for video frame interpolation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20).Google Scholar
Cross Ref
- [15] . 2021. Multiple video frame interpolation via enhanced deformable separable convolution. IEEE Trans. Pattern Anal. Mach. Intell. (2021), 1–1. Google Scholar
Digital Library
- [16] . 2020. ConvTransformer: A convolutional transformer network for video frame synthesis. arXiv:2011.10185. Retrieved from https://arXiv.org/abs/2011.10185.Google Scholar
- [17] . 2020. All at once: Temporally adaptive multi-frame interpolation with advanced motion modeling. arXiv:2007.11762. Retrieved from https://arXiv:org/abs/2007.11762.Google Scholar
- [18] . 2017. Video frame synthesis using deep voxel flow. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 4473–4481. Google Scholar
Cross Ref
- [19] . 2017. Video enhancement with task-oriented flow. arXiv:1711.09078. Retrieved from https://arXiv.org/abs/1711.09078.Google Scholar
- [20] . 2019. Zoom-in-to-check: boosting video interpolation via instance-level discrimination. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 12175–12183. Google Scholar
Cross Ref
- [21] . 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from https://arXiv.org/abs/1704.04861.Google Scholar
- [22] . 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1800–1807. Google Scholar
Cross Ref
- [23] . 2015. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 2758–2766. Google Scholar
Digital Library
- [24] . 2020. RIFE: Real-time intermediate flow estimation for video frame interpolation. arXiv:2011.06294. Retrieved from https://arXiv.org/abs/2011.06294.Google Scholar
- [25] . 2021. Robust video frame interpolation with exceptional motion map. IEEE Trans. Circ. Syst. Vid. Technol. 31, 2 (2021), 754–764. Google Scholar
Cross Ref
- [26] . 2019. FI-net: A lightweight video frame interpolation network using feature-level flow. IEEE Access 7 (2019), 118287–118296. Google Scholar
Cross Ref
- [27] . 2018. PhaseNet for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 498–507. Google Scholar
Cross Ref
- [28] . 2017. Frame interpolation using generative adversarial networks. (2017).Google Scholar
- [29] . 2020. Efficient video frame interpolation using generative adversarial networks. Appl. Sci. 10, 18 (2020), 6245. Google Scholar
Cross Ref
- [30] . 2020. Multi-scale attention generative adversarial networks for video frame interpolation. IEEE Access 8 (2020), 94842–94851. Google Scholar
Cross Ref
- [31] . 2020. A lightweight network model for video frame interpolation using spatial Pyramids. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20). 543–547. Google Scholar
Cross Ref
- [32] . 2020. ALANET: Adaptive latent attention network forjoint video deblurring and interpolation. arXiv:2009.01005. Retrieved from https://arXiv.org/abs/2009.01005.Google Scholar
- [33] . 2016. Wide residual networks. arXiv:1605.07146. Retrieved from https://arXiv.org/abs/1605.0714.Google Scholar
- [34] . 2009. Learning deep architectures for AI. Now Publishers Inc.Google Scholar
- [35] . 2007. A database and evaluation methodology for optical flow. In Proceedings of the IEEE 11th International Conference on Computer Vision. 1–8. Google Scholar
Cross Ref
- [36] . 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402. Retrieved from https://arXiv.org/abs/1212.0402.Google Scholar
- [37] . 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 724–732. Google Scholar
Cross Ref
- [38] . 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV’16), , , , and (Eds.). Springer International Publishing, Cham, 694–711.Google Scholar
Cross Ref
- [39] . 2019. PyTorch: An imperative style, high-performance deep learning library. arXiv:1912.01703. Retrieved from https://arXiv.org/abs/1912.01703.Google Scholar
- [40] . 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https://arXiv.org/abs/1412.6980.Google Scholar
- [41] . 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600–612. Google Scholar
Digital Library
- [42] . 2017. Deep roots: Improving CNN efficiency with hierarchical filter groups. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5977–5986. Google Scholar
Cross Ref
- [43] . 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of hte IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6848–6856. Google Scholar
Cross Ref
Index Terms
L2BEC2: Local Lightweight Bidirectional Encoding and Channel Attention Cascade for Video Frame Interpolation
Recommendations
Decoder-Side Super-Resolution and Frame Interpolation for Improved H.264 Video Coding
DCC '13: Proceedings of the 2013 Data Compression ConferenceIn literature decoder-side motion estimation is shown to improve video coding efficiency of both H.264 and HEVC standards. In this paper we introduce enhanced skip and direct modes for H.264 coding using decoder-side super-resolution (SR) and frame ...
Low Complexity Spatio-Temporal Key Frame Encoding for Wyner-Ziv Video Coding
DCC '09: Proceedings of the 2009 Data Compression ConferenceIn most Wyner-Ziv video coding approaches, the temporal correlation of key frames is not exploited since they are simply intra encoded and decoded. In this paper, using the previously decoded key frame as the side information for the key frame to be ...
Complexity-based intra frame rate control by jointing inter-frame correlation for high efficiency video coding
An intra-frame rate control algorithm by jointing inter-frame correlation is developed.A new prediction measure of content complexity for CTUs of intra-frame is proposed.A frame-level complexity-based bit-allocation-balancing method is brought up.A new ...






Comments