skip to main content
research-article

Make Full Use of Priors: Cross-View Optimized Filter for Multi-View Depth Enhancement

Authors Info & Claims
Published:17 December 2020Publication History
Skip Abstract Section

Abstract

Multi-view video plus depth (MVD) is the promising and widely adopted data representation for future 3D visual applications and interactive media. However, compression distortions on depth videos impede the development of such applications, and filters are crucially needed for the quality enhancement at the terminal side. Cross-view priors can intuitively be involved in filter design, but these priors are also distorted in compression and thus the contribution of them can hardly be considered in previous research. In this article, we propose a cross-view optimized filter for depth map quality enhancement by making full use of inner- and cross-view priors. We dedicate to evaluate the contributions of distorted cross-view priors in filtering the current view of depth, and then both inner- and cross-view priors can be involved in the filter design. Thus, distortions of cross-view priors are not barriers again as before. For the purpose of that, mutual information guided cross-view consistency is designed to evaluate the contributions of cross-view priors from compression distortions of MVD. After that, under the framework of global optimization, both inner- and cross-view priors are modeled and taken to minimize the designed energy function where both data accuracy and spatial smoothness are modeled. The experimental results show that the proposed model outperforms state-of-the-art methods, where 3.289 dB and 0.0407 average gains on peak signal-to-noise ratio and structural similarity metrics can be obtained, respectively. For the subjective evaluations, object details and structure information are recovered in the compressed depth video. We also verify our method via several practical applications, including virtual view synthesis for smooth interaction and point cloud for 3D modeling for accuracy evaluation. In these verifications, the ringing and malposition artifacts on object contours are properly handled for interactive video, and discontinuous object surfaces are restored for 3D modeling. All of these results suggest that compression distortions in MVD can be properly filtered by the proposed model, which provides a promising solution for future bandwidth constrained 3D and interactive visual applications.

References

  1. Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J. Brostow. 2019. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE International Conference on Computer Vision. 3828--3838.Google ScholarGoogle Scholar
  2. Jun Liu, Henghui Ding, Amir Shahroudy, Ling-Yu Duan, Xudong Jiang, Gang Wang, and Alex Kot Chichung. 2020. Feature boosting network for 3D pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 494--501.Google ScholarGoogle ScholarCross RefCross Ref
  3. Karsten Müller and Anthony Vetro. 2014. Common test conditions of 3DV core experiments, joint collaborative team on 3D video coding extensions (JCT-3V) document jct3v-g1100. In Proceedings of the 7th Meeting of the JCT.Google ScholarGoogle Scholar
  4. Guillaume Rochette, Chris Russell, and Richard Bowden. 2019. Weakly-supervised 3D pose estimation from a single image using multi-view consistency. arXiv:1909.06119Google ScholarGoogle Scholar
  5. Feng Shao, Gangyi Jiang, Mei Yu, Ken Chen, and Yo-Sung Ho. 2011. Asymmetric coding of multi-view video plus depth based 3-D video for view rendering. IEEE Transactions on Multimedia 14, 1 (2011), 157--167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274--2282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Payman Aflaki, Miska M. Hannuksela, Jukka Häkkinen, Paul Lindroos, and Moncef Gabbouj. 2010. Subjective study on compressed asymmetric stereoscopic video. In Proceedings of the IEEE International Conference on Image Processing.4021--4024.Google ScholarGoogle ScholarCross RefCross Ref
  8. Dimitrios S. Alexiadis, Dimitrios Zarpalas, and Petros Daras. 2012. Real-time, full 3-D reconstruction of moving foreground objects from multiple consumer depth cameras. IEEE Transactions on Multimedia 15, 2 (2012), 339--358.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yuri Boykov, Olga Veksler, and Ramin Zabih. 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 11 (2001), 1222--1239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Derek Chan, Hylke Buisman, Christian Theobalt, and Sebastian Thrun. 2008. A noise-aware filter for real-time depth upsampling. In Proceedings of the Workshop on Multi-Camera and Multi-Modal Sensor Fusion Algorithms and Applications.Google ScholarGoogle Scholar
  11. Siqi Chen, Qiong Liu, and You Yang. 2019. Multi-view multi-modality priors residual network of depth video enhancement for bandwidth limited asymmetric coding framework. In Proceedings of the 2019 Data Compression Conference (DCC’19). IEEE, Los Alamitos, CA, 560.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Choi, D. Min, and K. Sohn. 2014. Reliability-based multiview depth enhancement considering interview coherence. IEEE Transactions on Circuits and Systems for Video Technology 24, 4 (2014), 603--616.Google ScholarGoogle ScholarCross RefCross Ref
  13. Rui Dai and Ian F. Akyildiz. 2009. Joint effect of multiple correlated cameras in wireless multimedia sensor networks. In Proceedings of the IEEE International Conference on Communications. 143--147.Google ScholarGoogle Scholar
  14. James Diebel and Sebastian Thrun. 2006. An application of Markov random fields to range sensing. In Advances in Neural Information Processing Systems. 291--298.Google ScholarGoogle Scholar
  15. Weisheng Dong, Guangming Shi, Xin Li, Kefan Peng, Jinjian Wu, and Zhenhua Guo. 2016. Color-guided depth recovery via joint local structural and nonlocal low-rank regularization. IEEE Transactions on Multimedia 19, 2 (2016), 293--301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems. 2366--2374.Google ScholarGoogle Scholar
  17. Erhan Ekmekcioglu, Vladan Velisavljevic, and Stewart T. Worrall. 2011. Content adaptive enhancement of multi-view depth maps for free viewpoint video. IEEE Journal of Selected Topics in Signal Processing 5, 2 (2011), 352--361.Google ScholarGoogle ScholarCross RefCross Ref
  18. Christoph Fehn, Peter Kauff, Sukhee Cho, Hyoungjin Kwon, Namho Hur, and Jinwoong Kim. 2007. Asymmetric coding of stereoscopic video for transmission over T-DMB. In Proceedings of the 3DTV Conference. 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  19. Bumsub Ham, Minsu Cho, and Jean Ponce. 2015. Robust image filtering using joint static and dynamic guidance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4823--4831.Google ScholarGoogle ScholarCross RefCross Ref
  20. Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 6 (2013), 1397--1409.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alain Hore and Djemel Ziou. 2010. Image quality metrics: PSNR vs. SSIM. In Proceedings of the International Conference on Pattern Recognition (ICPR’10). IEEE, Los Alamitos, CA, 2366--2369.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhi Jin, Tammam Tillo, and Lei Luo. 2015. Quality enhancement of quality-asymmetric multiview plus depth video by using virtual view. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  23. Deukhyeon Kim, Jinwook Choi, and Kwanghoon Sohn. 2013. Multiview ToF sensor fusion technique for high-quality depth map. In Three-Dimensional Image Processing (3DIP) and Applications, Vol. 8650. International Society for Optics and Photonics, 865006.Google ScholarGoogle Scholar
  24. Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint bilateral upsampling. ACM Transactions on Graphics 26, 3 (2007), 96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Qiong Liu, You Yang, Yue Gao, Rongrong Ji, and Li Yu. 2013. A Bayesian framework for dense depth estimation based on spatial-temporal correlation. Neurocomputing 104 (2013), 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Qiong Liu, You Yang, Rongrong Ji, Yue Gao, and Li Yu. 2012. Cross-view down/up-sampling method for multiview depth video coding. IEEE Signal Processing Letters 19, 5 (2012), 295--298.Google ScholarGoogle ScholarCross RefCross Ref
  27. Wei Liu, Xiaogang Chen, Jie Yang, and Qiang Wu. 2017. Robust color guided depth map restoration. IEEE Transactions on Image Processing 26, 1 (2017), 315--327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhen Liu, Qiong Liu, You Yang, Yuchi Liu, Gangyi Jiang, and Mei Yu. 2016. Cluster-based cross-view filtering for compressed multi-view depth maps. In Proceedings of the Conference on Visual Communications and Image Processing (VCIP’16). IEEE, Los Alamitos, CA, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jiangbo Lu, Dongbo Min, Ramanpreet Singh Pahwa, and Minh N. Do. 2011. A revisit to MRF-based depth map super-resolution and enhancement. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11). IEEE, Los Alamitos, CA, 985--988.Google ScholarGoogle Scholar
  30. Dongbo Min, Jiangbo Lu, and Minh N. Do. 2012. Depth video enhancement based on weighted mode filtering.IEEE Transactions on Image Processing 21, 3 (2012), 1176--1190.Google ScholarGoogle Scholar
  31. Patrick Ndjiki-Nya, Martin Koppel, Dimitar Doshkov, Haricharan Lakshman, Philipp Merkle, Karsten Muller, and Thomas Wiegand. 2011. Depth image-based rendering with advanced texture synthesis for 3-D video. IEEE Transactions on Multimedia 13, 3 (2011), 453--465.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jaesik Park, Hyeongwoo Kim, Yu-Wing Tai, Michael S. Brown, and Inso Kweon. 2011. High quality depth map upsampling for 3D-ToF cameras. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 1623--1630.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Josien P. W. Pluim, J. B. Antoine Maintz, and Max A. Viergever. 2003. Mutual-information-based registration of medical images: A survey. IEEE Transactions on Medical Imaging 22, 8 (2003), 986--1004.Google ScholarGoogle ScholarCross RefCross Ref
  34. Yiguo Qiao, Licheng Jiao, Shuyuan Yang, and Biao Hou. 2018. A novel segmentation based depth map up-sampling. IEEE Transactions on Multimedia 21, 1 (2018), 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhou Ren, Junsong Yuan, Jingjing Meng, and Zhengyou Zhang. 2013. Robust part-based hand gesture recognition using Kinect sensor. IEEE Transactions on Multimedia 15, 5 (2013), 1110--1120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Karsten Müller and Anthony Vetro. 2014. Common test conditions of 3DV core experiments. ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, document JCT3V-G1100, San Jose, January 2014.Google ScholarGoogle Scholar
  37. Gary J. Sullivan, Jill M. Boyce, Ying Chen, Jens Rainer Ohm, C. Andrew Segall, and Anthony Vetro. 2013. Standardized extensions of high efficiency video coding (HEVC). IEEE Journal of Selected Topics in Signal Processing 7, 6 (2013), 1001--1016.Google ScholarGoogle ScholarCross RefCross Ref
  38. M. Tanimoto, T. Fujii, and K. Suzuki. 2008. Reference software of depth estimation and view synthesis for FTV/3DV. ISO/IEC JTC1/SC29/WG11, document M15836, Busan, Korea, October 2008.Google ScholarGoogle Scholar
  39. C. Tomasi and R. Manduchi. 1998. Bilateral filtering for gray and color images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’98). 839--846.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Wang, Y. Yang, and Q. Liu. 2020. Feature-aware trilateral filter with energy minimization for 3D mesh denoising. IEEE Access 8 (2020), 52232--52244.Google ScholarGoogle ScholarCross RefCross Ref
  41. Yanke Wang, Fan Zhong, Qunsheng Peng, and Xueying Qin. 2014. Depth map enhancement based on color and depth consistency. Visual Computer 30, 10 (2014), 1157--1168.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jun Xie, Rogerio Schmidt Feris, Shiaw-Shian Yu, and Ming-Ting Sun. 2015. Joint super resolution and denoising from a single depth image. IEEE Transactions on Multimedia 17, 9 (2015), 1525--1537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xuyuan Xu, Lai Man Po, Chun Ho Cheung, Kwok Wai Cheung, Litong Feng, Chi Wang Ting, and Ka Ho Ng. 2014. Adaptive depth truncation filter for MVC based compressed depth image. Signal Processing Image Communication 29, 3 (2014), 316--331.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Yang, X. Ye, K. Li, C. Hou, and Y. Wang. 2014. Color-guided depth recovery from RGB-D data using an adaptive autoregressive model.IEEE Transactions on Image Processing 23, 8 (2014), 3443--3458.Google ScholarGoogle Scholar
  46. You Yang, Qiong Liu, Xin He, and Zhen Liu. 2018. Cross-view multi-lateral filter for compressed multi-view depth video. IEEE Transactions on Image Processing 28, 1 (2018), 302--315.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Lijun Zhao, Huihui Bai, Anhong Wang, Yao Zhao, and Bing Zeng. 2017. Two-stage filtering of compressed depth images with Markov random field. Signal Processing Image Communication 54 (2017), 11--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Lijun Zhao, Anhong Wang, Bing Zeng, and Yingchun Wu. 2015. Candidate value-based boundary filtering for compressed depth images. Electronics Letters 51, 3 (2015), 224--226.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Make Full Use of Priors: Cross-View Optimized Filter for Multi-View Depth Enhancement

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 4
          November 2020
          372 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3444749
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 December 2020
          • Accepted: 1 June 2020
          • Revised: 1 May 2020
          • Received: 1 October 2019
          Published in tomm Volume 16, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!