skip to main content
research-article

Quality Enhancement of Compressed 360-Degree Videos Using Viewport-based Deep Neural Networks

Published:06 February 2023Publication History
Skip Abstract Section

Abstract

360-degree video provides omnidirectional views by a bounding sphere, thus also called omnidirectional video. For omnidirectional video, people can only see specific content in the viewport through head movement, i.e., only a small portion of the 360-degree content is exposed at a given time. Therefore, the viewport quality is of particular importance for 360-degree videos. In this article, we propose a quality enhancement of compressed 360-degree videos using viewport-based deep neural networks, named V-DNN. V-DNN is mainly composed of two modules: viewport prediction network (VPN) and viewport quality enhancement network (VQEN). VPN based on spherical convolution and 2D convolution generates potential viewports for omnidirectional video. VQEN takes the current viewport and its reference viewports as the input and enhances residual for the current viewport based on bidirectional offset prediction and Spatio-temporal deformable convolutions. Compared with HM16.16 baseline at QP = 37 under the Low Delay P (LDP) configuration, experimental results show that V-DNN achieves an average 0.605 dB and 0.0139 gains in viewport-based ΔPSNR and ΔMS-SSIM, respectively, and is 0.379 dB (59.63%) and 0.0073 (110.61%) higher than the multi-frame quality enhancement (MFQE-2.0) scheme at QP = 37, respectively. Moreover, V-DNN consistently outperforms MFQE-1.0, MFQE-2.0, and HM16.16 baseline at the other QPs in terms of ΔPSNR, ΔWS-PSNR, and ΔMS-SSIM.

REFERENCES

  1. [1] He Y., Xiu X., Hanhart P., Ye Y., Duanmu F., and Wang Y.. 2018. Content-adaptive 360-degree video coding using hybrid cubemap projection. In Proceedings of the Picture Coding Symposium. 313317. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Li L., Yan N., Li Z., Liu S., and Li H.. 2020. \(\lambda\)-domain perceptual rate control for 360-degree video compression. IEEE Journal of Selected Topics in Signal Processing 14, 1 (2020), 130145. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Li C., Xu M., Jiang L., Zhang S., and Tao X.. 2019. Viewport proposal CNN for 360\(^{\circ }\) video quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 169178. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Seshadrinathan K., Soundararajan R., Bovik A. C., and Cormack L. K.. 2010. Study of subjective and objective quality assessment of video. IEEE Transactions on Image Processing 19, 6 (2010), 14271441. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Tan T. K., Weerakkody R., Mrak M., Ramzan N., Baroncini V., Ohm J., and Sullivan G. J.. 2016. Video quality evaluation methodology and verification testing of HEVC compression performance. IEEE Transactions on Circuits and Systems for Video Technology 26, 1 (2016), 7690. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Yang R., Xu M., Wang Z., Duan Y., and Tao X.. 2018. Saliency-guided complexity control for HEVC decoding. IEEE Transactions on Broadcasting 64, 4 (2018), 865882. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Foi A., Katkovnik V., and Egiazarian K.. 2007. Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images. IEEE Transactions on Image Processing 16, 5 (2007), 13951411. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Zhang X., Xiong R., Fan X., Ma S., and Gao W.. 2013. Compression artifact reduction by overlapped-block transform coefficient estimation with block similarity. IEEE Transactions on Image Processing 22, 12 (2013), 46134626. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Yoo S., Choi Kyuha, and Ra J. B.. 2014. Post-processing for blocking artifact reduction based on inter-block correlation. IEEE Transactions on Multimedia 16, 6 (2014), 15361548.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Han Q. and Cham W.. 2015. High performance loop filter for HEVC. In Proceedings of the IEEE Conference on Image Processing. 19051909. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Sullivan G. J., Ohm J., Han W., and Wiegand T.. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 16491668. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Li Yang and Mou Xuanqin. 2021. Joint optimization for SSIM-based CTU-level bit allocation and rate distortion optimization. IEEE transactions on broadcasting 67, 2 (2021), 500511. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Shang Xiwu, Wang Guozhong, Liang Jie, Zhao Xiaoli, Han Hua, and Zuo Yifan. 2022. Color-sensitivity-based rate-distortion optimization for H.265/HEVC. IEEE Transactions on Circuits and Systems for Video Technology 32, 2 (2022), 802812. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Dong C., Deng Y., Loy C. C., and Tang X.. 2015. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE Conference on Computer Vision. 576584. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Li K., Bare B., and Yan B.. 2017. An efficient deep convolutional neural networks model for compressed image deblocking. In Proceedings of the IEEE International Conference on Multimedia and Expo. 13201325. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Sun Mengdi, He X., Xiong Shuhua, Ren Chao, and Li X.. 2020. Reduction of JPEG compression artifacts based on DCT coefficients prediction. Neurocomputing 384 (2020), 335345.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Yang R., Xu M., and Wang Z.. 2017. Decoder-side HEVC quality enhancement with scalable convolutional neural network. In Proceedings of the IEEE Conference on Multimedia and Expo. 817822. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Wang Dezhao, Xia Sifeng, Yang Wenhan, and Liu Jiaying. 2021. Combining progressive rethinking and collaborative learning: A deep framework for in-loop filtering. IEEE Transactions on Image Processing 30 (2021), 41984211. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Ma Di, Zhang Fan, and Bull David R.. 2021. MFRNet: A new CNN architecture for post-processing and in-loop filtering. IEEE Journal of Selected Topics in Signal Processing 15, 2 (2021), 378387. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Pan Zhaoqing, Yi Xiaokai, Zhang Yun, Jeon Byeungwoo, and Kwong Sam. 2020. Efficient in-loop filtering based on enhanced deep convolutional neural networks for HEVC. IEEE Transactions on Image Processing 29 (2020), 53525366. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Huang Hongyue, Schiopu Ionut, and Munteanu Adrian. 2021. Frame-wise CNN-based filtering for intra-frame quality enhancement of HEVC videos. IEEE Transactions on Circuits and Systems for Video Technology 31, 6 (2021), 21002113. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Caballero J., Ledig C., Aitken Andrew, Acosta A., Totz J., Wang Zehan, and Shi W.. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), 28482857.Google ScholarGoogle Scholar
  23. [23] Kim Tae Hyun, Sajjadi Mehdi S. M., Hirsch Michael, and Schölkopf Bernhard. 2018. Spatio-temporal transformer network for video restoration. In Proceedings of the European Conference on Computer Vision.Ferrari Vittorio, Hebert Martial, Sminchisescu Cristian, and Weiss Yair (Eds.), Springer International Publishing, 111127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Xue Tianfan, Chen Baian, Wu Jiajun, Wei Donglai, and Freeman William T. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision 127, 8 (2019), 11061125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Li Y., Xu J., and Chen Z.. 2019. Spherical domain rate-distortion optimization for omnidirectional video coding. IEEE Transactions on Circuits and Systems for Video Technology 29, 6 (2019), 17671780. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Sun Y. and Yu L.. 2017. Coding optimization based on weighted-to-spherically-uniform quality metric for 360 video. In Proceedings of the IEEE Conference on Visual Communications and Image Processing. 14. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Li Y., Xu J., and Chen Z.. 2017. Spherical domain rate-distortion optimization for 360-degree video coding. In Proceedings of the IEEE Conference on Multimedia and Expo. 709714. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Xu M., Jiang L., Li C., Wang Z., and Tao X.. 2022. Viewport-based CNN: A multi-task approach for assessing 360 degree video quality. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2022), 2198–2215. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Lebreton Pierre and Raake Alexander. 2018. GBVS360, BMS360, ProSal: Extending existing saliency prediction models from 2D to omnidirectional images. Signal Processing: Image Communication 69 (2018), 6978. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Ling Jing, Zhang Kao, Zhang Yingxue, Yang Daiqin, and Chen Zhenzhong. 2018. A saliency prediction model on 360 degree images using color dictionary based sparse representation. Signal Processing: Image Communication 69 (2018), 6068. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Zhu Yucheng, Zhai Guangtao, Yang Yiwei, Duan Huiyu, Min Xiongkuo, and Yang Xiaokang. 2022. Viewing behavior supported visual saliency predictor for 360 degree videos. IEEE Transactions on Circuits and Systems for Video Technology 32, 7 (2022), 4188–4201. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Li Chen, Xu Mai, Du Xinzhe, and Wang Zulin. 2018. Bridge the gap between VQA and human behavior on omnidirectional video: A large-scale dataset and a deep learning model. In Proceedings of the ACM Conference on Multimedia. Association for Computing Machinery, New York, NY, 932940. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Yang R., Xu M., Wang Z., and Li T.. 2018. Multi-frame quality enhancement for compressed video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 66646673. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Guan Zhenyu, Xing Qunliang, Xu Mai, Yang Ren, Liu Tie, and Wang Zulin. 2021. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2021), 949963. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Jancsary Jeremy, Nowozin Sebastian, and Rother Carsten. 2012. Loss-specific training of non-parametric image restoration models: A new state of the art. In Proceedings of the European Conference on Computer Vision.Springer, 112125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Chang H., Ng M. K., and Zeng T.. 2014. Reducing artifacts in JPEG decompression via a learned dictionary. IEEE Transactions on Signal Processing 62, 3 (2014), 718728. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Jung Cheolkon, Jiao Licheng, Qi Hongtao, and Sun Tian. 2012. Image deblocking via sparse representation. Signal Processing: Image Communication 27, 6 (2012), 663677. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Li Jianwei, Wang Yongtao, Xie Haihua, and Ma Kai-Kuang. 2020. Learning a single model with a wide range of quality factors for JPEG image artifacts removal. IEEE Transactions on Image Processing 29 (2020), 88428854. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Fu Xueyang, Wang Xi, Liu Aiping, Han Junwei, and Zha Zheng-Jun. 2021. Learning dual priors for JPEG compression artifacts removal. In Proceedings of the IEEE Conference on Computer Vision. 40664075. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Chen Honggang, He Xiaohai, Yang Hong, Qing Linbo, and Teng Qizhi. 2022. A feature-enriched deep convolutional neural network for JPEG image compression artifacts reduction and its applications. IEEE Transactions on Neural Networks and Learning Systems 33, 1 (2022), 430444. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Zhang J., Jia C., Zhang N., Ma S., and Gao W.. 2016. Structure-driven adaptive non-local filter for high efficiency video coding (HEVC). In Proceedings of the Data Compression Conference. 91100. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Guo H., Zhu C., Xu M., and Li S.. 2020. Inter-block dependency-based CTU level rate control for HEVC. IEEE Transactions on Broadcasting 66, 1 (2020), 113126. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Li Li, Li Zhu, Ma Xiang, Yang Haitao, and Li Houqiang. 2017. Co-projection-plane based 3-D padding for polyhedron projection for 360-degree video. In Proceedings of the IEEE Conference on Multimedia and Expo. 5560. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Kim Yeong Won, Lee Chang-Ryeol, Cho Dae-Yong, Kwon Yong Hoon, Choi Hyeok-Jae, and Yoon Kuk-Jin. 2017. Automatic content-aware projection for 360\(^{\circ }\) videos. In Proceedings of the IEEE Conference on Computer Vision. 47534761. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Su Yu-Chuan and Grauman Kristen. 2021. Learning compressible 360 video isomers. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 8 (2021), 26972709. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Carreira J., Faria Sergio M. M. de, Tavora Luis M. N., Navarro Antonio, and Assuncao Pedro A.. 2021. Attention-driven tile splitting method for improved efficiency of omnidirectional versatile video coding. In Proceedings of the IEEE Conference on Image Processing. 21492153. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Storch Iago, Agostini Luciano, Zatt Bruno, Bampi Sergio, and Palomino Daniel. 2022. FastInter360: A fast inter mode decision for HEVC 360 video coding. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2022), 3235–3249. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Ding Dandan, Kong Lingyi, Chen Guangyao, Liu Zoe, and Fang Yong. 2020. A switchable deep learning approach for in-loop filtering in video coding. IEEE Transactions on Circuits and Systems for Video Technology 30, 7 (2020), 18711887. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Yang R., Xu M., Liu T., Wang Z., and Guan Z.. 2019. Enhancing quality for HEVC compressed videos. IEEE Transactions on Circuits and Systems for Video Technology 29, 7 (2019), 20392054. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Corbillon Xavier, Simone Francesca De, and Simon Gwendal. 2017. 360-degree video head movement dataset. In Proceedings of the ACM Conference on Multimedia Systems. Association for Computing Machinery, New York, NY, 199204. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Zhu Yucheng, Zhai Guangtao, and Min Xiongkuo. 2018. The prediction of head and eye movement for 360 degree images. Signal Processing: Image Communication 69 (2018), 1525. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Cheng H., Chao C., Dong J., Wen H., Liu T., and Sun M.. 2018. Cube padding for weakly-supervised saliency prediction in 360\(^{\circ }\) videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 14201429. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Xu M., Song Y., Wang J., Qiao M., Huo L., and Wang Z.. 2019. Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 11 (2019), 26932708. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Xu Y., Dong Y., Wu J., Sun Z., Shi Z., Yu J., and Gao S.. 2018. Gaze prediction in dynamic 360\(^{\circ }\) immersive videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 53335342. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Cohen Taco S., Geiger Mario, Köhler Jonas, and Welling Max. 2018. Spherical CNNs. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  56. [56] Su Yu-Chuan and Grauman Kristen. 2021. Learning spherical convolution for 360 recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 11. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Snyder John. 1987. Map Projections – A Working Manual. US Government Printing Office.Google ScholarGoogle Scholar
  58. [58] Dai Jifeng, Qi Haozhi, Xiong Yuwen, Li Yi, Zhang Guodong, Hu Han, and Wei Yichen. 2017. Deformable Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision. 764–773. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] L. M. Kells. 1940. Plane and Spherical Trigonometry with Tables by Lyman M. Kells, Willis F. Kern, James R. Bland. US Armed Forces Institute.Google ScholarGoogle Scholar
  60. [60] Girshick R.. 2015. Fast R-CNN. In Proceedings of the IEEE Conference on Computer Vision. 14401448. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Sun Y. L., Lu A., and Lu Y.. 2016. AHG8: WS-PSNR for 360 video objective quality evaluation. Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JVET-D0040, 4th Meeting. (2016)Google ScholarGoogle Scholar
  62. [62] Wang Zhou, Bovik A. C., Sheikh H. R., and Simoncelli E. P.. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600612. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Wiegand T., Sullivan G. J., Bjontegaard G., and Luthra A.. 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560576. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Quality Enhancement of Compressed 360-Degree Videos Using Viewport-based Deep Neural Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2
      March 2023
      540 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572860
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 February 2023
      • Online AM: 5 August 2022
      • Accepted: 18 July 2022
      • Revised: 3 June 2022
      • Received: 1 February 2022
      Published in tomm Volume 19, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)175
      • Downloads (Last 6 weeks)18

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!