Abstract
360-degree video provides omnidirectional views by a bounding sphere, thus also called omnidirectional video. For omnidirectional video, people can only see specific content in the viewport through head movement, i.e., only a small portion of the 360-degree content is exposed at a given time. Therefore, the viewport quality is of particular importance for 360-degree videos. In this article, we propose a quality enhancement of compressed 360-degree videos using viewport-based deep neural networks, named V-DNN. V-DNN is mainly composed of two modules: viewport prediction network (VPN) and viewport quality enhancement network (VQEN). VPN based on spherical convolution and 2D convolution generates potential viewports for omnidirectional video. VQEN takes the current viewport and its reference viewports as the input and enhances residual for the current viewport based on bidirectional offset prediction and Spatio-temporal deformable convolutions. Compared with HM16.16 baseline at QP = 37 under the Low Delay P (LDP) configuration, experimental results show that V-DNN achieves an average 0.605 dB and 0.0139 gains in viewport-based ΔPSNR and ΔMS-SSIM, respectively, and is 0.379 dB (59.63%) and 0.0073 (110.61%) higher than the multi-frame quality enhancement (MFQE-2.0) scheme at QP = 37, respectively. Moreover, V-DNN consistently outperforms MFQE-1.0, MFQE-2.0, and HM16.16 baseline at the other QPs in terms of ΔPSNR, ΔWS-PSNR, and ΔMS-SSIM.
- [1] . 2018. Content-adaptive 360-degree video coding using hybrid cubemap projection. In Proceedings of the Picture Coding Symposium. 313–317.
DOI: Google ScholarCross Ref
- [2] . 2020. \(\lambda\)-domain perceptual rate control for 360-degree video compression. IEEE Journal of Selected Topics in Signal Processing 14, 1 (2020), 130–145.
DOI: Google ScholarCross Ref
- [3] . 2019. Viewport proposal CNN for 360\(^{\circ }\) video quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 169–178.
DOI: Google ScholarCross Ref
- [4] . 2010. Study of subjective and objective quality assessment of video. IEEE Transactions on Image Processing 19, 6 (2010), 1427–1441.
DOI: Google ScholarDigital Library
- [5] . 2016. Video quality evaluation methodology and verification testing of HEVC compression performance. IEEE Transactions on Circuits and Systems for Video Technology 26, 1 (2016), 76–90.
DOI: Google ScholarDigital Library
- [6] . 2018. Saliency-guided complexity control for HEVC decoding. IEEE Transactions on Broadcasting 64, 4 (2018), 865–882.
DOI: Google ScholarCross Ref
- [7] . 2007. Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images. IEEE Transactions on Image Processing 16, 5 (2007), 1395–1411.
DOI: Google ScholarDigital Library
- [8] . 2013. Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity. IEEE Transactions on Image Processing 22, 12 (2013), 4613–4626.
DOI: Google ScholarDigital Library
- [9] . 2014. Post-processing for blocking artifact reduction based on inter-block correlation. IEEE Transactions on Multimedia 16, 6 (2014), 1536–1548.Google Scholar
Cross Ref
- [10] . 2015. High performance loop filter for HEVC. In Proceedings of the IEEE Conference on Image Processing. 1905–1909.
DOI: Google ScholarDigital Library
- [11] . 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668.
DOI: Google ScholarDigital Library
- [12] . 2021. Joint optimization for SSIM-based CTU-level bit allocation and rate distortion optimization. IEEE transactions on broadcasting 67, 2 (2021), 500–511.
DOI: Google ScholarCross Ref
- [13] . 2022. Color-sensitivity-based rate-distortion optimization for H.265/HEVC. IEEE Transactions on Circuits and Systems for Video Technology 32, 2 (2022), 802–812.
DOI: Google ScholarCross Ref
- [14] . 2015. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE Conference on Computer Vision. 576–584.
DOI: Google ScholarDigital Library
- [15] . 2017. An efficient deep convolutional neural networks model for compressed image deblocking. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1320–1325.
DOI: Google ScholarCross Ref
- [16] . 2020. Reduction of JPEG compression artifacts based on DCT coefficients prediction. Neurocomputing 384 (2020), 335–345.Google Scholar
Digital Library
- [17] . 2017. Decoder-side HEVC quality enhancement with scalable convolutional neural network. In Proceedings of the IEEE Conference on Multimedia and Expo. 817–822.
DOI: Google ScholarCross Ref
- [18] . 2021. Combining progressive rethinking and collaborative learning: A deep framework for in-loop filtering. IEEE Transactions on Image Processing 30 (2021), 4198–4211.
DOI: Google ScholarCross Ref
- [19] . 2021. MFRNet: A new CNN architecture for post-processing and in-loop filtering. IEEE Journal of Selected Topics in Signal Processing 15, 2 (2021), 378–387.
DOI: Google ScholarCross Ref
- [20] . 2020. Efficient in-loop filtering based on enhanced deep convolutional neural networks for HEVC. IEEE Transactions on Image Processing 29 (2020), 5352–5366.
DOI: Google ScholarCross Ref
- [21] . 2021. Frame-wise CNN-based filtering for intra-frame quality enhancement of HEVC videos. IEEE Transactions on Circuits and Systems for Video Technology 31, 6 (2021), 2100–2113.
DOI: Google ScholarCross Ref
- [22] . 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), 2848–2857.Google Scholar
- [23] . 2018. Spatio-temporal transformer network for video restoration. In Proceedings of the European Conference on Computer Vision., , , and (Eds.), Springer International Publishing, 111–127.Google Scholar
Digital Library
- [24] . 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.Google Scholar
Digital Library
- [25] . 2019. Spherical domain rate-distortion optimization for omnidirectional video coding. IEEE Transactions on Circuits and Systems for Video Technology 29, 6 (2019), 1767–1780.
DOI: Google ScholarCross Ref
- [26] . 2017. Coding optimization based on weighted-to-spherically-uniform quality metric for 360 video. In Proceedings of the IEEE Conference on Visual Communications and Image Processing. 1–4.
DOI: Google ScholarCross Ref
- [27] . 2017. Spherical domain rate-distortion optimization for 360-degree video coding. In Proceedings of the IEEE Conference on Multimedia and Expo. 709–714.
DOI: Google ScholarCross Ref
- [28] . 2022. Viewport-based CNN: A multi-task approach for assessing 360 degree video quality. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2022), 2198–2215.
DOI: Google ScholarCross Ref
- [29] . 2018. GBVS360, BMS360, ProSal: Extending existing saliency prediction models from 2D to omnidirectional images. Signal Processing: Image Communication 69 (2018), 69–78.
DOI: Google ScholarCross Ref
- [30] . 2018. A saliency prediction model on 360 degree images using color dictionary based sparse representation. Signal Processing: Image Communication 69 (2018), 60–68.
DOI: Google ScholarCross Ref
- [31] . 2022. Viewing behavior supported visual saliency predictor for 360 degree videos. IEEE Transactions on Circuits and Systems for Video Technology 32, 7 (2022), 4188–4201.
DOI: Google ScholarDigital Library
- [32] . 2018. Bridge the gap between VQA and human behavior on omnidirectional video: A large-scale dataset and a deep learning model. In Proceedings of the ACM Conference on Multimedia. Association for Computing Machinery, New York, NY, 932–940.
DOI: Google ScholarDigital Library
- [33] . 2018. Multi-frame quality enhancement for compressed video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6664–6673.
DOI: Google ScholarCross Ref
- [34] . 2021. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2021), 949–963.
DOI: Google ScholarCross Ref
- [35] . 2012. Loss-specific training of non-parametric image restoration models: A new state of the art. In Proceedings of the European Conference on Computer Vision.Springer, 112–125.Google Scholar
Digital Library
- [36] . 2014. Reducing artifacts in JPEG decompression via a learned dictionary. IEEE Transactions on Signal Processing 62, 3 (2014), 718–728.
DOI: Google ScholarDigital Library
- [37] . 2012. Image deblocking via sparse representation. Signal Processing: Image Communication 27, 6 (2012), 663–677.
DOI: Google ScholarDigital Library
- [38] . 2020. Learning a single model with a wide range of quality factors for JPEG image artifacts removal. IEEE Transactions on Image Processing 29 (2020), 8842–8854.
DOI: Google ScholarCross Ref
- [39] . 2021. Learning dual priors for JPEG compression artifacts removal. In Proceedings of the IEEE Conference on Computer Vision. 4066–4075.
DOI: Google ScholarCross Ref
- [40] . 2022. A feature-enriched deep convolutional neural network for JPEG image compression artifacts reduction and its applications. IEEE Transactions on Neural Networks and Learning Systems 33, 1 (2022), 430–444.
DOI: Google ScholarCross Ref
- [41] . 2016. Structure-driven adaptive non-local filter for high efficiency video coding (HEVC). In Proceedings of the Data Compression Conference. 91–100.
DOI: Google ScholarCross Ref
- [42] . 2020. Inter-block dependency-based CTU level rate control for HEVC. IEEE Transactions on Broadcasting 66, 1 (2020), 113–126.
DOI: Google ScholarCross Ref
- [43] . 2017. Co-projection-plane based 3-D padding for polyhedron projection for 360-degree video. In Proceedings of the IEEE Conference on Multimedia and Expo. 55–60.
DOI: Google ScholarCross Ref
- [44] . 2017. Automatic content-aware projection for 360\(^{\circ }\) videos. In Proceedings of the IEEE Conference on Computer Vision. 4753–4761.
DOI: Google ScholarCross Ref
- [45] . 2021. Learning compressible 360 video isomers. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 8 (2021), 2697–2709.
DOI: Google ScholarCross Ref
- [46] . 2021. Attention-driven tile splitting method for improved efficiency of omnidirectional versatile video coding. In Proceedings of the IEEE Conference on Image Processing. 2149–2153.
DOI: Google ScholarCross Ref
- [47] . 2022. FastInter360: A fast inter mode decision for HEVC 360 video coding. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2022), 3235–3249.
DOI: Google ScholarDigital Library
- [48] . 2020. A switchable deep learning approach for in-loop filtering in video coding. IEEE Transactions on Circuits and Systems for Video Technology 30, 7 (2020), 1871–1887.
DOI: Google ScholarDigital Library
- [49] . 2019. Enhancing quality for HEVC compressed videos. IEEE Transactions on Circuits and Systems for Video Technology 29, 7 (2019), 2039–2054.
DOI: Google ScholarCross Ref
- [50] . 2017. 360-degree video head movement dataset. In Proceedings of the ACM Conference on Multimedia Systems. Association for Computing Machinery, New York, NY, 199–204.
DOI: Google ScholarDigital Library
- [51] . 2018. The prediction of head and eye movement for 360 degree images. Signal Processing: Image Communication 69 (2018), 15–25.
DOI: Google ScholarCross Ref
- [52] . 2018. Cube padding for weakly-supervised saliency prediction in 360\(^{\circ }\) videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1420–1429.
DOI: Google ScholarCross Ref
- [53] . 2019. Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 11 (2019), 2693–2708.
DOI: Google ScholarCross Ref
- [54] . 2018. Gaze prediction in dynamic 360\(^{\circ }\) immersive videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5333–5342.
DOI: Google ScholarCross Ref
- [55] . 2018. Spherical CNNs. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [56] . 2021. Learning spherical convolution for 360 recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1.
DOI: Google ScholarCross Ref
- [57] . 1987. Map Projections – A Working Manual. US Government Printing Office.Google Scholar
- [58] . 2017. Deformable Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision. 764–773.
DOI: Google ScholarCross Ref
- [59] L. M. Kells. 1940. Plane and Spherical Trigonometry with Tables by Lyman M. Kells, Willis F. Kern, James R. Bland. US Armed Forces Institute.Google Scholar
- [60] . 2015. Fast R-CNN. In Proceedings of the IEEE Conference on Computer Vision. 1440–1448.
DOI: Google ScholarDigital Library
- [61] . 2016. AHG8: WS-PSNR for 360 video objective quality evaluation. Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JVET-D0040, 4th Meeting. (2016)Google Scholar
- [62] . 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
DOI: Google ScholarDigital Library
- [63] . 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560–576.
DOI: Google ScholarDigital Library
Index Terms
Quality Enhancement of Compressed 360-Degree Videos Using Viewport-based Deep Neural Networks
Recommendations
Viewport-adaptive 360-degree video coding
Abstract360-degree videos contain an omnidirectional view with ultra-high resolution, which will lead to the bandwidth-hungry issue in virtual reality (VR) applications. However, only a part of a 360-degree video is displayed on the head-mounted displays (...
Multi-viewport based 3D convolutional neural network for 360-degree video quality assessment
Abstract360-degree videos, also known as omnidirectional or panoramic videos, provide the user an immersive experience that 2D videos cannot provide. It is crucial to access the perceived quality of the 360-degree video. 2D video quality assessment (VQA) ...
Tiled streaming for layered 3D virtual reality videos with viewport prediction
AbstractIn recent years, the demand of 3D video services has gradually increased. More and more bandwidth hungry applications are proposed, such as immersive media services which need a virtual reality (VR) headset and 3D VR videos to provide users ...






Comments