Abstract
Multi-view video plus depth (MVD) is the promising and widely adopted data representation for future 3D visual applications and interactive media. However, compression distortions on depth videos impede the development of such applications, and filters are crucially needed for the quality enhancement at the terminal side. Cross-view priors can intuitively be involved in filter design, but these priors are also distorted in compression and thus the contribution of them can hardly be considered in previous research. In this article, we propose a cross-view optimized filter for depth map quality enhancement by making full use of inner- and cross-view priors. We dedicate to evaluate the contributions of distorted cross-view priors in filtering the current view of depth, and then both inner- and cross-view priors can be involved in the filter design. Thus, distortions of cross-view priors are not barriers again as before. For the purpose of that, mutual information guided cross-view consistency is designed to evaluate the contributions of cross-view priors from compression distortions of MVD. After that, under the framework of global optimization, both inner- and cross-view priors are modeled and taken to minimize the designed energy function where both data accuracy and spatial smoothness are modeled. The experimental results show that the proposed model outperforms state-of-the-art methods, where 3.289 dB and 0.0407 average gains on peak signal-to-noise ratio and structural similarity metrics can be obtained, respectively. For the subjective evaluations, object details and structure information are recovered in the compressed depth video. We also verify our method via several practical applications, including virtual view synthesis for smooth interaction and point cloud for 3D modeling for accuracy evaluation. In these verifications, the ringing and malposition artifacts on object contours are properly handled for interactive video, and discontinuous object surfaces are restored for 3D modeling. All of these results suggest that compression distortions in MVD can be properly filtered by the proposed model, which provides a promising solution for future bandwidth constrained 3D and interactive visual applications.
- Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J. Brostow. 2019. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE International Conference on Computer Vision. 3828--3838.Google Scholar
- Jun Liu, Henghui Ding, Amir Shahroudy, Ling-Yu Duan, Xudong Jiang, Gang Wang, and Alex Kot Chichung. 2020. Feature boosting network for 3D pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 494--501.Google Scholar
Cross Ref
- Karsten Müller and Anthony Vetro. 2014. Common test conditions of 3DV core experiments, joint collaborative team on 3D video coding extensions (JCT-3V) document jct3v-g1100. In Proceedings of the 7th Meeting of the JCT.Google Scholar
- Guillaume Rochette, Chris Russell, and Richard Bowden. 2019. Weakly-supervised 3D pose estimation from a single image using multi-view consistency. arXiv:1909.06119Google Scholar
- Feng Shao, Gangyi Jiang, Mei Yu, Ken Chen, and Yo-Sung Ho. 2011. Asymmetric coding of multi-view video plus depth based 3-D video for view rendering. IEEE Transactions on Multimedia 14, 1 (2011), 157--167.Google Scholar
Digital Library
- Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274--2282.Google Scholar
Digital Library
- Payman Aflaki, Miska M. Hannuksela, Jukka Häkkinen, Paul Lindroos, and Moncef Gabbouj. 2010. Subjective study on compressed asymmetric stereoscopic video. In Proceedings of the IEEE International Conference on Image Processing.4021--4024.Google Scholar
Cross Ref
- Dimitrios S. Alexiadis, Dimitrios Zarpalas, and Petros Daras. 2012. Real-time, full 3-D reconstruction of moving foreground objects from multiple consumer depth cameras. IEEE Transactions on Multimedia 15, 2 (2012), 339--358.Google Scholar
Digital Library
- Yuri Boykov, Olga Veksler, and Ramin Zabih. 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 11 (2001), 1222--1239.Google Scholar
Digital Library
- Derek Chan, Hylke Buisman, Christian Theobalt, and Sebastian Thrun. 2008. A noise-aware filter for real-time depth upsampling. In Proceedings of the Workshop on Multi-Camera and Multi-Modal Sensor Fusion Algorithms and Applications.Google Scholar
- Siqi Chen, Qiong Liu, and You Yang. 2019. Multi-view multi-modality priors residual network of depth video enhancement for bandwidth limited asymmetric coding framework. In Proceedings of the 2019 Data Compression Conference (DCC’19). IEEE, Los Alamitos, CA, 560.Google Scholar
Cross Ref
- J. Choi, D. Min, and K. Sohn. 2014. Reliability-based multiview depth enhancement considering interview coherence. IEEE Transactions on Circuits and Systems for Video Technology 24, 4 (2014), 603--616.Google Scholar
Cross Ref
- Rui Dai and Ian F. Akyildiz. 2009. Joint effect of multiple correlated cameras in wireless multimedia sensor networks. In Proceedings of the IEEE International Conference on Communications. 143--147.Google Scholar
- James Diebel and Sebastian Thrun. 2006. An application of Markov random fields to range sensing. In Advances in Neural Information Processing Systems. 291--298.Google Scholar
- Weisheng Dong, Guangming Shi, Xin Li, Kefan Peng, Jinjian Wu, and Zhenhua Guo. 2016. Color-guided depth recovery via joint local structural and nonlocal low-rank regularization. IEEE Transactions on Multimedia 19, 2 (2016), 293--301.Google Scholar
Digital Library
- David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems. 2366--2374.Google Scholar
- Erhan Ekmekcioglu, Vladan Velisavljevic, and Stewart T. Worrall. 2011. Content adaptive enhancement of multi-view depth maps for free viewpoint video. IEEE Journal of Selected Topics in Signal Processing 5, 2 (2011), 352--361.Google Scholar
Cross Ref
- Christoph Fehn, Peter Kauff, Sukhee Cho, Hyoungjin Kwon, Namho Hur, and Jinwoong Kim. 2007. Asymmetric coding of stereoscopic video for transmission over T-DMB. In Proceedings of the 3DTV Conference. 1--4.Google Scholar
Cross Ref
- Bumsub Ham, Minsu Cho, and Jean Ponce. 2015. Robust image filtering using joint static and dynamic guidance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4823--4831.Google Scholar
Cross Ref
- Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 6 (2013), 1397--1409.Google Scholar
Digital Library
- Alain Hore and Djemel Ziou. 2010. Image quality metrics: PSNR vs. SSIM. In Proceedings of the International Conference on Pattern Recognition (ICPR’10). IEEE, Los Alamitos, CA, 2366--2369.Google Scholar
Digital Library
- Zhi Jin, Tammam Tillo, and Lei Luo. 2015. Quality enhancement of quality-asymmetric multiview plus depth video by using virtual view. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops. 1--6.Google Scholar
Cross Ref
- Deukhyeon Kim, Jinwook Choi, and Kwanghoon Sohn. 2013. Multiview ToF sensor fusion technique for high-quality depth map. In Three-Dimensional Image Processing (3DIP) and Applications, Vol. 8650. International Society for Optics and Photonics, 865006.Google Scholar
- Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint bilateral upsampling. ACM Transactions on Graphics 26, 3 (2007), 96.Google Scholar
Digital Library
- Qiong Liu, You Yang, Yue Gao, Rongrong Ji, and Li Yu. 2013. A Bayesian framework for dense depth estimation based on spatial-temporal correlation. Neurocomputing 104 (2013), 1--9.Google Scholar
Digital Library
- Qiong Liu, You Yang, Rongrong Ji, Yue Gao, and Li Yu. 2012. Cross-view down/up-sampling method for multiview depth video coding. IEEE Signal Processing Letters 19, 5 (2012), 295--298.Google Scholar
Cross Ref
- Wei Liu, Xiaogang Chen, Jie Yang, and Qiang Wu. 2017. Robust color guided depth map restoration. IEEE Transactions on Image Processing 26, 1 (2017), 315--327.Google Scholar
Digital Library
- Zhen Liu, Qiong Liu, You Yang, Yuchi Liu, Gangyi Jiang, and Mei Yu. 2016. Cluster-based cross-view filtering for compressed multi-view depth maps. In Proceedings of the Conference on Visual Communications and Image Processing (VCIP’16). IEEE, Los Alamitos, CA, 1--4.Google Scholar
Cross Ref
- Jiangbo Lu, Dongbo Min, Ramanpreet Singh Pahwa, and Minh N. Do. 2011. A revisit to MRF-based depth map super-resolution and enhancement. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11). IEEE, Los Alamitos, CA, 985--988.Google Scholar
- Dongbo Min, Jiangbo Lu, and Minh N. Do. 2012. Depth video enhancement based on weighted mode filtering.IEEE Transactions on Image Processing 21, 3 (2012), 1176--1190.Google Scholar
- Patrick Ndjiki-Nya, Martin Koppel, Dimitar Doshkov, Haricharan Lakshman, Philipp Merkle, Karsten Muller, and Thomas Wiegand. 2011. Depth image-based rendering with advanced texture synthesis for 3-D video. IEEE Transactions on Multimedia 13, 3 (2011), 453--465.Google Scholar
Digital Library
- Jaesik Park, Hyeongwoo Kim, Yu-Wing Tai, Michael S. Brown, and Inso Kweon. 2011. High quality depth map upsampling for 3D-ToF cameras. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 1623--1630.Google Scholar
Digital Library
- Josien P. W. Pluim, J. B. Antoine Maintz, and Max A. Viergever. 2003. Mutual-information-based registration of medical images: A survey. IEEE Transactions on Medical Imaging 22, 8 (2003), 986--1004.Google Scholar
Cross Ref
- Yiguo Qiao, Licheng Jiao, Shuyuan Yang, and Biao Hou. 2018. A novel segmentation based depth map up-sampling. IEEE Transactions on Multimedia 21, 1 (2018), 1--14.Google Scholar
Digital Library
- Zhou Ren, Junsong Yuan, Jingjing Meng, and Zhengyou Zhang. 2013. Robust part-based hand gesture recognition using Kinect sensor. IEEE Transactions on Multimedia 15, 5 (2013), 1110--1120.Google Scholar
Digital Library
- Karsten Müller and Anthony Vetro. 2014. Common test conditions of 3DV core experiments. ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, document JCT3V-G1100, San Jose, January 2014.Google Scholar
- Gary J. Sullivan, Jill M. Boyce, Ying Chen, Jens Rainer Ohm, C. Andrew Segall, and Anthony Vetro. 2013. Standardized extensions of high efficiency video coding (HEVC). IEEE Journal of Selected Topics in Signal Processing 7, 6 (2013), 1001--1016.Google Scholar
Cross Ref
- M. Tanimoto, T. Fujii, and K. Suzuki. 2008. Reference software of depth estimation and view synthesis for FTV/3DV. ISO/IEC JTC1/SC29/WG11, document M15836, Busan, Korea, October 2008.Google Scholar
- C. Tomasi and R. Manduchi. 1998. Bilateral filtering for gray and color images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’98). 839--846.Google Scholar
Digital Library
- Y. Wang, Y. Yang, and Q. Liu. 2020. Feature-aware trilateral filter with energy minimization for 3D mesh denoising. IEEE Access 8 (2020), 52232--52244.Google Scholar
Cross Ref
- Yanke Wang, Fan Zhong, Qunsheng Peng, and Xueying Qin. 2014. Depth map enhancement based on color and depth consistency. Visual Computer 30, 10 (2014), 1157--1168.Google Scholar
Digital Library
- Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600.Google Scholar
Digital Library
- Jun Xie, Rogerio Schmidt Feris, Shiaw-Shian Yu, and Ming-Ting Sun. 2015. Joint super resolution and denoising from a single depth image. IEEE Transactions on Multimedia 17, 9 (2015), 1525--1537.Google Scholar
Digital Library
- Xuyuan Xu, Lai Man Po, Chun Ho Cheung, Kwok Wai Cheung, Litong Feng, Chi Wang Ting, and Ka Ho Ng. 2014. Adaptive depth truncation filter for MVC based compressed depth image. Signal Processing Image Communication 29, 3 (2014), 316--331.Google Scholar
Digital Library
- J. Yang, X. Ye, K. Li, C. Hou, and Y. Wang. 2014. Color-guided depth recovery from RGB-D data using an adaptive autoregressive model.IEEE Transactions on Image Processing 23, 8 (2014), 3443--3458.Google Scholar
- You Yang, Qiong Liu, Xin He, and Zhen Liu. 2018. Cross-view multi-lateral filter for compressed multi-view depth video. IEEE Transactions on Image Processing 28, 1 (2018), 302--315.Google Scholar
Digital Library
- Lijun Zhao, Huihui Bai, Anhong Wang, Yao Zhao, and Bing Zeng. 2017. Two-stage filtering of compressed depth images with Markov random field. Signal Processing Image Communication 54 (2017), 11--22.Google Scholar
Digital Library
- Lijun Zhao, Anhong Wang, Bing Zeng, and Yingchun Wu. 2015. Candidate value-based boundary filtering for compressed depth images. Electronics Letters 51, 3 (2015), 224--226.Google Scholar
Cross Ref
Index Terms
Make Full Use of Priors: Cross-View Optimized Filter for Multi-View Depth Enhancement
Recommendations
Security of MVD-based 3D video in 3D-HEVC using data hiding and encryption
AbstractTo safely transmit secret data and protect three-dimensional (3D) videos, a novel jointly data hiding and encryption method for multi-view video plus depth based 3D video is proposed. Both data hiding and encryption are all format complaint for ...
Multi-view non-negative matrix factorization by patch alignment framework with view consistency
Multi-view non-negative matrix factorization (NMF) has been developed to learn the latent representation from multi-view non-negative data in recent years. To make the representation more meaningful, previous works mainly exploit either the consensus ...
Subjective and Objective Video Quality Assessment of 3D Synthesized Views With Texture/Depth Compression Distortion
The quality assessment for synthesized video with texture/depth compression distortion is important for the design, optimization, and evaluation of the multi-view video plus depth (MVD)-based 3D video system. In this paper, the subjective and objective ...






Comments