skip to main content
research-article
Open Access

Synthetic defocus and look-ahead autofocus for casual videography

Published:12 July 2019Publication History
Skip Editorial Notes Section

Editorial Notes

A corrigendum was issued for this article on July 19, 2019. You can download the corrigendum from the source materials section of this citation page.

Skip Abstract Section

Abstract

In cinema, large camera lenses create beautiful shallow depth of field (DOF), but make focusing difficult and expensive. Accurate cinema focus usually relies on a script and a person to control focus in realtime. Casual videographers often crave cinematic focus, but fail to achieve it. We either sacrifice shallow DOF, as in smartphone videos; or we struggle to deliver accurate focus, as in videos from larger cameras. This paper is about a new approach in the pursuit of cinematic focus for casual videography. We present a system that synthetically renders refocusable video from a deep DOF video shot with a smartphone, and analyzes future video frames to deliver context-aware autofocus for the current frame. To create refocusable video, we extend recent machine learning methods designed for still photography, contributing a new dataset for machine training, a rendering model better suited to cinema focus, and a filtering solution for temporal coherence. To choose focus accurately for each frame, we demonstrate autofocus that looks at upcoming video frames and applies AI-assist modules such as motion, face, audio and saliency detection. We also show that autofocus benefits from machine learning and a large-scale video dataset with focus annotation, where we use our RVR-LAAF GUI to create this sizable dataset efficiently. We deliver, for example, a shallow DOF video where the autofocus transitions onto each person before she begins to speak. This is impossible for conventional camera autofocus because it would require seeing into the future.

Skip Supplemental Material Section

Supplemental Material

References

  1. Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 (2016).Google ScholarGoogle Scholar
  2. Jonathan T Barron, Andrew Adams, YiChang Shih, and Carlos Hernández. 2015. Fast bilateral-space stereo for synthetic defocus. In CVPR.Google ScholarGoogle Scholar
  3. Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi, and Stephen Gould. 2016. Dynamic image networks for action recognition. In CVPR.Google ScholarGoogle Scholar
  4. Qifeng Chen and Vladlen Koltun. 2016. Full flow: Optical flow estimation by global optimization over regular grids. In CVPR.Google ScholarGoogle Scholar
  5. Qifeng Chen and Vladlen Koltun. 2017. Photographic image synthesis with cascaded refinement networks. In ICCV.Google ScholarGoogle Scholar
  6. Paul E Debevec and Jitendra Malik. 2008. Recovering high dynamic range radiance maps from photographs. ACM Trans. on Graphics (TOG) (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gabriel Eilertsen, Joel Kronander, Gyorgy Denes, Rafal Mantiuk, and Jonas Unger. 2017a. HDR image reconstruction from a single exposure using deep CNNs. ACM Trans. on Graphics (TOG) (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gabriel Eilertsen, Joel Kronander, Gyorgy Denes, Rafał K Mantiuk, and Jonas Unger. 2017b. HDR image reconstruction from a single exposure using deep CNNs. ACM Trans. on Graphics (TOG) (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T Freeman, and Michael Rubinstein. 2018. Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation. ACM Trans. on Graphics (TOG) (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. D. Evangelidis and E. Z. Psarakis. 2008. Parametric Image Alignment Using Enhanced Correlation Coefficient Maximization. PAMI (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In CVPR.Google ScholarGoogle Scholar
  12. R Fontaine. 2017. A survey of enabling technologies in successful consumer digital imaging products. In International Image Sensors workshop.Google ScholarGoogle Scholar
  13. Ravi Garg, Vijay Kumar BG, Gustavo Carneiro, and Ian Reid. 2016. Unsupervised cnn for single view depth estimation: Geometry to the rescue. In ECCV.Google ScholarGoogle Scholar
  14. Michaël Gharbi, Jiawen Chen, Jonathan T Barron, Samuel W Hasinoff, and Frédo Durand. 2017. Deep bilateral learning for real-time image enhancement. ACM Trans. on Graphics (TOG) (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Clément Godard, Oisin Mac Aodha, and Gabriel J Brostow. 2017. Unsupervised monocular depth estimation with left-right consistency. In CVPR.Google ScholarGoogle Scholar
  16. Norman Goldberg. 1992. Camera technology: the dark side of the lens.Google ScholarGoogle Scholar
  17. Robert T Held, Emily A Cooper, James F O'brien, and Martin S Banks. 2010. Using blur to affect perceived distance and size. ACM Trans. on Graphics (TOG) (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. João F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. PAMI (2015).Google ScholarGoogle Scholar
  19. Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip Torr. 2017. Deeply supervised salient object detection with short connections. In CVPR.Google ScholarGoogle Scholar
  20. Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In CVPR.Google ScholarGoogle Scholar
  21. Aaron Isaksen, Leonard McMillan, and Steven J Gortler. 2000. Dynamically reparameterized light fields. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV.Google ScholarGoogle Scholar
  23. Neel Joshi and Larry Zitnick. 2014. Micro-Baseline Stereo. Technical Report.Google ScholarGoogle Scholar
  24. Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Leonid Keselman, John Iselin Woodfill, Anders Grunnet-Jepsen, and Achintya Bhowmik. 2017. Intel realsense stereoscopic depth cameras. In CVPR Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  26. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.Google ScholarGoogle Scholar
  27. Masahiro Kobayashi, Michiko Johnson, Yoichi Wada, Hiromasa Tsuboi, Hideaki Takada, Kenji Togo, Takafumi Kishi, Hidekazu Takahashi, Takeshi Ichikawa, and Shunsuke Inoue. 2016. A low noise and high sensitivity image sensor with imaging and phase-difference detection AF in all pixels. ITE Trans. on Media Technology and Applications (2016).Google ScholarGoogle Scholar
  28. Martin Kraus and Magnus Strengert. 2007. Depth-of-field rendering by pyramidal image processing. CGF (2007).Google ScholarGoogle Scholar
  29. Yevhen Kuznietsov, Jörg Stückler, and Bastian Leibe. 2017. Semi-supervised deep learning for monocular depth map prediction. In CVPR.Google ScholarGoogle Scholar
  30. Marc Levoy and Pat Hanrahan. 1996. Light Field Rendering. (1996).Google ScholarGoogle Scholar
  31. Marc Levoy and Yael Pritch. 2017. Portrait mode on the Pixel 2 and Pixel 2 XL smartphones.Google ScholarGoogle Scholar
  32. Zhengqi Li and Noah Snavely. 2018. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In CVPR.Google ScholarGoogle Scholar
  33. Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In CVPR.Google ScholarGoogle Scholar
  34. George Mather. 1996. Image Blur as a Pictorial Depth Cue. Proc. Biological Sciences (1996).Google ScholarGoogle Scholar
  35. Atsushi Morimitsu, Isao Hirota, Sozo Yokogawa, Isao Ohdaira, Masao Matsumura, Hiroaki Takahashi, Toshio Yamazaki, Hideki Oyaizu, Yalcin Incesu, Muhammad Atif, et al. 2015. A 4M pixel full-PDAF CMOS image sensor with 1.58 μ m 2X 1 On-Chip Micro-Split-Lens technology. Technical Report.Google ScholarGoogle Scholar
  36. S. K. Nayar and Y. Nakagawa. 1994. Shape from focus. PAMI (1994). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ren Ng, Marc Levoy, Mathieu Brédif, Gene Duval, Mark Horowitz, and Pat Hanrahan. 2005. Light Field Photography with a Hand-held Plenoptic Camera. Technical Report.Google ScholarGoogle Scholar
  38. Abhijit S Ogale, Cornelia Fermuller, and Yiannis Aloimonos. 2005. Motion segmentation using occlusions. PAMI (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Andrew Owens and Alexei A Efros. 2018. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features. In CVPR.Google ScholarGoogle Scholar
  40. Jinsun Park, Yu-Wing Tai, Donghyeon Cho, and In So Kweon. 2017. A unified approach of multi-scale deep and hand-crafted features for defocus estimation. In CVPR.Google ScholarGoogle Scholar
  41. Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. 2017. Learning Features by Watching Objects Move. In CVPR.Google ScholarGoogle Scholar
  42. Alex Poms, Will Crichton, Pat Hanrahan, and Kayvon Fatahalian. 2018. Scanner: Efficient Video Analysis at Scale. (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Michael Potmesil and Indranil Chakravarty. 1982. Synthetic image generation with a lens and aperture camera model. ACM Trans. on Graphics (TOG) (1982). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, and Cordelia Schmid. 2016. Deepmatching: Hierarchical deformable dense matching. IJCV (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Anna Rohrbach, Marcus Rohrbach, Niket Tandon, and Bernt Schiele. 2015. A Dataset for Movie Description. In CVPR.Google ScholarGoogle Scholar
  46. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention.Google ScholarGoogle ScholarCross RefCross Ref
  47. Daniel Scharstein and Richard Szeliski. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).Google ScholarGoogle Scholar
  49. Pratul P Srinivasan, Rahul Garg, Neal Wadhwa, Ren Ng, and Jonathan T Barron. 2018. Aperture Supervision for Monocular Depth Estimation. (2018).Google ScholarGoogle Scholar
  50. Meijun Sun, Ziqi Zhou, Qinghua Hu, Zheng Wang, and Jianmin Jiang. 2018. SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection. IEEE Trans. on Cybernetics (2018).Google ScholarGoogle Scholar
  51. Jaeyong Sung, Colin Ponce, Bart Selman, and Ashutosh Saxena. 2012. Unstructured human activity detection from rgbd images. In ICRA.Google ScholarGoogle Scholar
  52. S. Suwajanakorn, C. Hernandez, and S. M. Seitz. 2015. Depth from focus with your mobile phone. In CVPR.Google ScholarGoogle Scholar
  53. Huixuan Tang, Scott Cohen, Brian L. Price, Stephen Schiller, and Kiriakos N. Kutulakos. 2017. Depth from Defocus in the Wild. In CVPR.Google ScholarGoogle Scholar
  54. Michael W. Tao, Sunil Hadap, Jitendra Malik, and Ravi Ramamoorthi. 2013. Depth from Combining Defocus and Correspondence Using light-Field Cameras. In ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. 2018. Tracking emerges by colorizing videos. In ECCV.Google ScholarGoogle Scholar
  56. Neal Wadhwa, Rahul Garg, David E. Jacobs, Bryan E. Feldman, Nori Kanazawa, Robert Carroll, Yair Movshovitz-Attias, Jonathan T. Barron, Yael Pritch, and Marc Levoy. 2018. Synthetic Depth-of-Field With A Single-Camera Mobile Phone. ACM Trans. on Graphics (TOG) (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Lijun Wang, Xiaohui Shen, Jianming Zhang, Oliver Wang, Zhe Lin, Chih-Yao Hsieh, Sarah Kong, and Huchuan Lu. 2018b. DeepLens: shallow depth of field from a single image. ACM Trans. on Graphics (TOG) (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Ting-Chun Wang, Jun-Yan Zhu, Nima Khademi Kalantari, Alexei A Efros, and Ravi Ramamoorthi. 2017. Light field video capture using a learning-based hybrid imaging system. ACM Trans. on Graphics (TOG) (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, and Ali Borji. 2018a. Revisiting Video Saliency: A Large-scale Benchmark and a New Model. In CVPR.Google ScholarGoogle Scholar
  60. Bennett Wilburn, Neel Joshi, Vaibhav Vaish, Eino-Ville Talvala, Emilio Antunez, Adam Barth, Andrew Adams, Mark Horowitz, and Marc Levoy. 2005. High Performance Imaging Using Large Camera Arrays. (2005).Google ScholarGoogle Scholar
  61. Yang Yang, Haiting Lin, Zhan Yu, Sylvain Paris, and Jingyi Yu. 2016. Virtual DSLR: High Quality Dynamic Depth-of-Field Synthesis on Mobile Platforms. In Digital Photography and Mobile Imaging.Google ScholarGoogle Scholar
  62. Zhan Yu, Christopher Thorpe, Xuan Yu, Scott Grauer-Gray, Feng Li, and Jingyi Yu. 2011. Dynamic Depth of Field on Live Video Streams: A Stereo Solution. In CGI.Google ScholarGoogle Scholar
  63. Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. In ECCV.Google ScholarGoogle Scholar
  64. Xuaner Zhang, Ren Ng, and Qifeng Chen. 2018. Single Image Reflection Removal with Perceptual Losses. In CVPR.Google ScholarGoogle Scholar
  65. Michael Zollhöfer, Patrick Stotko, Andreas Görlitz, Christian Theobalt, Matthias Nießner, Reinhard Klein, and Andreas Kolb. 2018. State of the Art on 3D Reconstruction with RGB-D Cameras. In Computer Graphics Forum.Google ScholarGoogle Scholar

Index Terms

  1. Synthetic defocus and look-ahead autofocus for casual videography

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 38, Issue 4
        August 2019
        1480 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3306346
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 July 2019
        Published in tog Volume 38, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader