skip to main content
research-article

Reconstructing 3D Face Models by Incremental Aggregation and Refinement of Depth Frames

Published:23 January 2019Publication History
Skip Abstract Section

Abstract

Face recognition from two-dimensional (2D) still images and videos is quite successful even with “in the wild” conditions. Instead, less consolidated results are available for the cases in which face data come from non-conventional cameras, such as infrared or depth. In this article, we investigate this latter scenario assuming that a low-resolution depth camera is used to perform face recognition in an uncooperative context. To this end, we propose, first, to automatically select a set of frames from the depth sequence of the camera because they provide a good view of the face in terms of pose and distance. Then, we design a progressive refinement approach to reconstruct a higher-resolution model from the selected low-resolution frames. This process accounts for the anisotropic error of the existing points in the current 3D model and the points in a newly acquired frame so that the refinement step can progressively adjust the point positions in the model using a Kalman-like estimation. The quality of the reconstructed model is evaluated by considering the error between the reconstructed models and their corresponding high-resolution scans used as ground truth. In addition, we performed face recognition using the reconstructed models as probes against a gallery of reconstructed models and a gallery with high-resolution scans. The obtained results confirm the possibility to effectively use the reconstructed models for the face recognition task.

References

  1. S. Berretti, M. Daoudi, P. Turaga, and A. Basu. 2018. Representation, analysis, and recognition of 3D humans: A survey. ACM Transactions on Multimedia Computing Communications, and Applications 14, 1s (March 2018), 16:1--16:36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Berretti, A. Del Bimbo, and P. Pala. 2010. 3D face recognition using isogeodesic stripes. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 12 (Dec 2010), 2162--2177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Berretti, A. Del Bimbo, and P. Pala. 2013. Sparse matching of salient facial curves for recognition of 3-D faces with missing parts. IEEE Transactions on Information Forensics and Security 8, 2 (Feb 2013), 374--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Berretti, P. Pala, and A. Del Bimbo. 2014. Face recognition by super-resolved 3D models from consumer depth cameras. IEEE Transactions on Information Forensics and Security 9, 9 (Sept 2014), 1436--1449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Berretti, N. Werghi, A. Del Bimbo, and P. Pala. 2013. Matching 3D face scans using interest points and local histogram descriptors. Computers 8 Graphics 37, 5 (2013), 509--525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Blanz and T. Vetter. 1999. A morphable model for the synthesis of 3D faces. In ACM Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’99). ACM Press/Addison-Wesley Publishing Co., New York, NY, 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Bondi, P. Pala, S. Berretti, and A. Del Bimbo. 2016. Reconstructing high-resolution face models from Kinect depth sequences. IEEE Transactions on Information Forensics and Security 11, 12 (Dec 2016), 2843--2853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Q. Cao, L. Lin, Y. Shi, X. Liang, and G. Li. 2017. Attention-aware face hallucination via deep reinforcement learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, 1656--1664.Google ScholarGoogle Scholar
  9. J. Choi, A. Sharma, and G. Medioni. 2013. Comparing strategies for 3D face recognition from a 3D sensor. In 2013 IEEE RO-MAN. IEEE, Gyeongju, Korea, 19--24.Google ScholarGoogle Scholar
  10. Z. Cui, H. Chang, S. Shan, B. Zhong, and X. Chen. 2014. Deep network cascade for image super-resolution. In European Conference on Computer Vision (ECCV’14), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 49--64.Google ScholarGoogle Scholar
  11. M. Dimitrievski, B. Goossens, P. Veelaert, and W. Philips. 2017. High resolution depth reconstruction from monocular images and sparse point clouds using deep convolutional neural network. SPIE Optical Engineering + Applications 10410 (2017), 10410--10410--9.Google ScholarGoogle Scholar
  12. C. Dong, C. C. Loy, K. He, and X. Tang. 2014. Learning a deep convolutional network for image super-resolution. In European Conference on Computer Vision (ECCV’14), D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars (Eds.). Springer International Publishing, Cham, 184--199.Google ScholarGoogle Scholar
  13. P. Dou, S. K. Shah, and I. A. Kakadiaris. 2017. End-to-end 3D face reconstruction with deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, 1503--1512.Google ScholarGoogle Scholar
  14. D. C. Dowson and B. V. Landau. 1982. The Fréchet distance between multivariate normal distributions. Journal of Multivariate Analysis 12, 3 (1982), 450--455.Google ScholarGoogle ScholarCross RefCross Ref
  15. H. Drira, B. Ben Amor, A. Srivastava, M. Daoudi, and R. Slama. 2013. 3D face recognition under expressions, occlusions, and pose variations. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9 (Sept 2013), 2270--2283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Fankhauser, M. Bloesch, D. Rodriguez, R. Kaestner, M. Hutter, and R. Siegwart. 2015. Kinect v2 for mobile robot navigation: Evaluation and modeling. In IEEE International Conference on Advanced Robotics (ICAR’15). IEEE, Istanbul, Turkey, 388--394.Google ScholarGoogle Scholar
  17. M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, and A. Del Bimbo. 2015. 3-D human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Transactions on Cybernetics 45, 7 (July 2015), 1340--1352.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Hernandez, J. Choi, and G. Medioni. 2012. Laser scan quality 3-D face modeling using a low-cost depth camera. In European Signal Processing Conference (EUSIPCO’12). IEEE, Bucharest, Romania, 1995--1999.Google ScholarGoogle Scholar
  19. B. K. P. Horn and M. J. Brooks (Eds.). 1989. Shape from Shading. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Huber, P. Kopp, W. Christmas, M. Rätsch, and J. Kittler. 2017. Real-time 3D face fitting and texture fusion on in-the-wild videos. IEEE Signal Processing Letters 24, 4 (April 2017), 437--441.Google ScholarGoogle ScholarCross RefCross Ref
  21. K. Al Ismaeil, D. Aouada, B. Mirbach, and B. Ottersten. 2016. Enhancement of dynamic depth scenes by upsampling for precise super-resolution (UP-SR). Computer Vision and Image Understanding 147 (2016), 38--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Al Ismaeil, D. Aouada, T. Solignac, B. Mirbach, and B. Ottersten. 2015. Real-time non-rigid multi-frame depth video super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’15). IEEE, Boston, MA, 8--16.Google ScholarGoogle Scholar
  23. K. Al Ismaeil, D. Aouada, T. Solignac, B. Mirbach, and B. Ottersten. 2017. Real-time enhancement of dynamic depth videos with non-rigid deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 10 (Oct 2017), 2045--2059.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Izadi, R. A. Newcombe, D. Kim, O. Hilliges, D. Molyneaux, S. Hodges, P. Kohli, J. Shotton, A. J. Davison, and A. Fitzgibbon. 2011. KinectFusion: Real-time dynamic 3D surface reconstruction and interaction. In ACM SIGGRAPH. ACM, Vancouver, BC, Canada, 23:1--23:1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. S. Jackson, A. Bulat, V. Argyriou, and G. Tzimiropoulos. 2017. Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In IEEE International Conference on Computer Vision (ICCV’17). IEEE, Venice, Italy, 1031--1039.Google ScholarGoogle Scholar
  26. I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and E. Brossard. 2016. The megaface benchmark: 1 million faces for recognition at scale. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 4873--4882.Google ScholarGoogle Scholar
  27. J. Kim, J. K. Lee, and K. M. Lee. 2016. Accurate image super-resolution using very deep convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1646--1654.Google ScholarGoogle Scholar
  28. S. Liang, I. Kemelmacher-Shlizerman, and L. G. Shapiro. 2014. 3D face hallucination from a single depth frame. In International Conference on 3D Vision, Vol. 1. IEEE, Tokyo, Japan, 31--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Liu, H.-Y. Shum, and W. T. Freeman. 2007. Face hallucination: Theory and practice. International Journal of Computer Vision 75, 1 (Oct 2007), 115--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Myronenko and X. Song. 2010. Point set registration: Coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 12 (Dec 2010), 2262--2275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Nasrollahi and T. B. Moeslund. 2014. Super-resolution: A comprehensive survey. Machine Vision and Applications 25, 6 (Aug 2014), 1423--1468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. A. Newcombe, D. Fox, and S. M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 343--352.Google ScholarGoogle Scholar
  33. R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon. 2011. KinectFusion: Real-time dense surface mapping and tracking. In IEEE International Symposium on Mixed and Augmented Reality. IEEE, Basel, Switzerland, 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Nguyen, C. Fookes, S. Sridharan, M. Tistarelli, and M. Nixon. 2018. Super-resolution for biometrics: A comprehensive survey. Pattern Recognition 78 (2018), 23--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Pan, S. Han, Z. Wu, and Y. Wang. 2006. Super-resolution of 3D face. In European Conference on Computer Vision (ECCV’06). Springer, Berlin, 389--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. C. Park, M. K. Park, and M. G. Kang. 2003. Super-resolution image reconstruction: A technical overview. IEEE Signal Processing Magazine 20, 3 (May 2003), 21--36.Google ScholarGoogle Scholar
  37. G. Passalis, P. Perakis, T. Theoharis, and I. A. Kakadiaris. 2011. Using facial symmetry to handle pose variations in real-world 3D face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 10 (Oct 2011), 1938--1951. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Peng, G. Pan, and Z. Wu. 2005. Learning-based super-resolution of 3D face model. In IEEE International Conference on Image Processing (ICIP’05), Vol. 2. IEEE, Genoa, Italy, II--382--5.Google ScholarGoogle Scholar
  39. J. S. J. Ren, L. Xu, Q. Yan, and W. Sun. 2015. Shepard convolutional neural networks. In Advances in Neural Information Processing Systems, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 901--909. http://papers.nips.cc/paper/5774-shepard-convolutional-neural-networks.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Sagawa, N. Osawa, and Y. Yagi. 2006. A probabilistic method for aligning and merging range images with anisotropic error distribution. In International Symposium on 3D Data Processing, Visualization, and Transmission. IEEE, Chapel Hill, NC, 559--566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Schuon, C. Theobalt, J. Davis, and S. Thrun. 2009. LidarBoost: Depth superresolution for ToF 3D shape scanning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Miami, FL, 343--350.Google ScholarGoogle Scholar
  42. J. Sell and P. O’Connor. 2014. The Xbox one system on a chip and Kinect sensor. IEEE Micro 34, 2 (Mar 2014), 44--53.Google ScholarGoogle ScholarCross RefCross Ref
  43. A. T. Tran, T. Hassner, I. Masi, and G. Medioni. 2017. Regressing robust and discriminative 3D morphable models with a very deep neural network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Honolulu, HI, 5163--5172.Google ScholarGoogle Scholar
  44. J. D. van Ouwerkerk. 2006. Image super-resolution survey. Image and Vision Computing 24, 10 (2006), 1039--1052.Google ScholarGoogle ScholarCross RefCross Ref
  45. Paul Viola and Michael J. Jones. 2004. Robust real-time face detection. International Journal of Computer Vision 57, 2 (May 2004), 137--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. N. Wang, D. Tao, X. Gao, X. Li, and J. Li. 2014. A comprehensive survey to face hallucination. International Journal of Computer Vision 106, 1 (Jan 2014), 9--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. X. Wang and X. Tang. 2005. Hallucinating face by eigentransformation. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 35, 3 (Aug 2005), 425--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Williams and M. Bennamoun. 2000. Multiple view surface registration with error modeling and analysis. In IEEE International Conference on Image Processing, Vol. 1. IEEE, Vancouver, BC, Canada, 545--548.Google ScholarGoogle Scholar
  49. R. J. Woodham. 1980. Photometric method for determining surface orientation from multiple images. Optical Engineering 19, 1 (Feb. 1980), 139--144.Google ScholarGoogle ScholarCross RefCross Ref
  50. Chenghua Xu, Tieniu Tan, Yunhong Wang, and Long Quan. 2006. Combining local features for robust nose location in 3D facial data. Pattern Recognition Letters 27, 13 (2006), 1487--1494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. C. Yang, S. Liu, and M. Yang. 2013. Structured face hallucination. In IEEE Conference on Computer Vision and Pattern Recognition. 1099--1106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Yang, J. Wright, T. S. Huang, and Y. Ma. 2010. Image super-resolution via sparse representation. IEEE Transactions on Image Processing 19, 11 (Nov 2010), 2861--2873. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Q. Yang, R. Yang, J. Davis, and D. Nister. 2007. Spatial-depth super resolution for range images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). IEEE, Minneapolis, MN, 1--8.Google ScholarGoogle Scholar
  54. L. Yue, H. Shen, J. Li, Q. Yuan, H. Zhang, and L. Zhang. 2016. Image super-resolution: The techniques, applications, and future. Signal Processing 128 (2016), 389--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Z. Zhang and O. Faugeras. 1992. A 3D world model builder with a mobile robot. The International Journal of Robotics Research 11, 4 (1992), 269--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin. 2015. Learning face hallucination in the wild. In AAAI Conference on Artificial Intelligence (AAAI’15). AAAI Press, 3871--3877. http://dl.acm.org/citation.cfm?id=2888116.2888253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. S. Zhu, S. Liu, C. C. Loy, and X. Tang. 2016. Deep cascaded Bi-network for face hallucination. In European Conference on Computer Vision (ECCV’16), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 614--630.Google ScholarGoogle Scholar

Index Terms

  1. Reconstructing 3D Face Models by Incremental Aggregation and Refinement of Depth Frames

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 1
            February 2019
            265 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3309717
            Issue’s Table of Contents

            Copyright © 2019 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 January 2019
            • Accepted: 1 October 2018
            • Revised: 1 August 2018
            • Received: 1 March 2018
            Published in tomm Volume 15, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!