research-article

Joint Stabilization and Direction of 360° Videos

Published:18 March 2019Publication History
Skip Abstract Section

Abstract

Three-hundred-sixty-degree (360°) video provides an immersive experience for viewers, allowing them to freely explore the world by turning their head. However, creating high-quality 360° video content can be challenging, as viewers may miss important events by looking in the wrong direction, or they may see things that ruin the immersion, such as stitching artifacts and the film crew. We take advantage of the fact that not all directions are equally likely to be observed; most viewers are more likely to see content located at “true north,” i.e., in front of them, due to ergonomic constraints. We therefore propose 360° video direction, where the video is jointly optimized to orient important events to the front of the viewer and visual clutter behind them, while producing smooth camera motion. Unlike traditional video, viewers can still explore the space as desired, but with the knowledge that the most important content is likely to be in front of them. Constraints can be user guided, either added directly on the equirectangular projection or by recording “guidance” viewing directions while watching the video in a VR headset or automatically computed, such as via visual saliency or forward-motion direction. To accomplish this, we propose a new motion estimation technique specifically designed for 360° video that outperforms the commonly used five-point algorithm on wide-angle video. We additionally formulate the direction problem as an optimization where a novel parametrization of spherical warping allows us to correct for some degree of parallax effects. We compare our approach to recent methods that address stabilization-only and converting 360° video to narrow field-of-view video. Our pipeline can also enable the viewing of wide-angle non-360° footage in a spherical 360° space, giving an immersive “virtual cinema” experience for a wide range of existing content filmed with first-person cameras.

References

  1. Sameer Agarwal, Keir Mierle, and others. 2017. Ceres Solver. Retrieved from https://code.google.com/p/ceres-solver/.Google ScholarGoogle Scholar
  2. Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernández, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual reality video. ACM Trans. Graph. 35, 6, Article 198 (Nov. 2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chris Buehler, Michael Bosse, and Leonard McMillan. 2001. Non-metric image-based rendering for video stabilization. In Proceedings of the 2001 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’01), Vol. 2. II--II.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bing-Yu Chen, Ken-Yi Lee, Wei-Ting Huang, and Jong-Shan Lin. 2008. Capturing intention-based full-frame video stabilization. Comput. Graph. Forum 27, 7 (2008).Google ScholarGoogle Scholar
  5. Y. Dai, H. Li, and L. Kneip. 2016. Rolling shutter camera relative pose: Generalized epipolar geometry. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 4132--4140.Google ScholarGoogle Scholar
  6. Vineet Gandhi, Remi Ronfard, and Michael Gleicher. 2014. Multi-clip video editing from a single viewpoint. In Proceedings of the ACM SIGGRAPH European Conference on Visual Media Production (CVMP’14). 9:1--9:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Michael L. Gleicher and Feng Liu. 2008. Re-cinematography: Improving the camerawork of casual video. ACM Trans. Multimedia Comput. Commun. Appl. 5, 1, Article 2 (Oct. 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Amit Goldstein and Raanan Fattal. 2012. Video stabilization using epipolar geometry. ACM Trans. Graph. 31, 5, Article 126 (Sep. 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jeremy J. Gray. 1980. Olinde Rodrigues’ paper of 1840 on transformation groups. Arch. Hist. Exact Sci. 21, 4 (Dec. 1980), 375--385.Google ScholarGoogle ScholarCross RefCross Ref
  10. Matthias Grundmann, Vivek Kwatra, and Irfan Essa. 2011. Auto-directed video stabilization with robust L1 optimal camera paths. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 225--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. I. Hartley and A. Zisserman. 2004. Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, and Min Sun. 2017. Deep 360 pilot: Learning a deep agent for piloting through 360 sports video. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1396--1405.Google ScholarGoogle ScholarCross RefCross Ref
  13. Eakta Jain, Yaser Sheikh, Ariel Shamir, and Jessica Hodgins. 2015. Gaze-driven video re-editing. ACM Trans. Graph. 34, 2, Article 21 (Mar. 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jason Jerald. 2016. The VR Book: Human-Centered Design for Virtual Reality. Association for Computing Machinery and Morgan 8 Claypool. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Kneip and S. Lynen. 2013. Direct optimization of frame-to-frame rotation. In Proceedings of the 2013 IEEE International Conference on Computer Vision (CVPR’13). 2352--2359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Laurent Kneip, Roland Siegwart, and Marc Pollefeys. 2012. Finding the exact rotation between two images independently of the translation. In Proceedings of the 2012 European Conference on Computer Vision (ECCV’12). 696--709. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Johannes Kopf. 2016. 360° video stabilization. ACM Trans. Graph. 35, 6, Article 195 (Nov. 2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Johannes Kopf, Michael F. Cohen, and Richard Szeliski. 2014. First-person hyper-lapse videos. ACM Trans. Graph. 33, 4, Article 78 (Jul. 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jungjin Lee, Bumki Kim, Kyehyun Kim, Younghui Kim, and Junyong Noh. 2016. Rich360: Optimized spherical representation from structured panoramic camera arrays. ACM Trans. Graph. 35, 4, Article 63 (Jul. 2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ken-Yi Lee, Yung-Yu Chuang, Bing-Yu Chen, and Ming Ouhyoung. 2009. Video stabilization using robust feature trajectories. In Proceedings of the 2009 IEEE International Conference on Computer Vision (ICCV’09). 1397--1404.Google ScholarGoogle ScholarCross RefCross Ref
  21. Hongdong Li and Richard Hartley. 2006. Five-point motion estimation made easy. In Proceedings of the 2006 International Conference on Pattern Recognition (ICPR’06), Vol. 1. 630--633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Feng Liu, Michael Gleicher, Hailin Jin, and Aseem Agarwala. 2009. Content-preserving warps for 3d video stabilization. ACM Trans. Graph. 28, 3, Article 44 (Jul. 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Feng Liu, Michael Gleicher, Jue Wang, Hailin Jin, and Aseem Agarwala. 2011. Subspace video stabilization. ACM Trans. Graph. 30, 1, Article 4 (Feb. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shuaicheng Liu, Lu Yuan, Ping Tan, and Jian Sun. 2013. Bundled camera paths for video stabilization. ACM Trans. Graph. 32, 4, Article 78 (Jul. 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Scott MacKenzie. 2013. Human--Computer Interaction: An Empirical Research Perspective (1st ed.). Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yasuyuki Matsushita, Eyal Ofek, Weina Ge, Xiaoou Tang, and Heung-Yeung Shum. 2006. Full-frame video stabilization with motion inpainting. IEEE Trans. Pattern Anal. Mach. Intell. 28, 7 (Jul. 2006), 1150--1163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Nister. 2004. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26, 6 (Jun. 2004), 756--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ana Serrano, Vincent Sitzmann, Jaime Ruiz-Borau, Gordon Wetzstein, Diego Gutierrez, and Belen Masia. 2017. Movie Editing and Cognitive Event Segmentation in Virtual Reality Video. ACM Trans. Graph. 36, 4, Article 47 (Jul. 2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jianbo Shi and Carlo Tomasi. 1994. Good features to track. In Proceedings of the 1994 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’94). 593--600.Google ScholarGoogle Scholar
  30. V. Sitzmann, A. Serrano, A. Pavel, M. Agrawala, D. Gutierrez, B. Masia, and G. Wetzstein. 2018. Saliency in VR: How do people explore virtual environments? IEEE Trans. Vis. Comput. Graph. 24, 4 (Apr. 2018), 1633--1642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Stewenius, D. Nister, F. Kahl, and F. Schaffalitzky. 2005. A minimal solution for relative pose with unknown focal length. In Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’05). 789--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hans Strasburger, Ingo Rentschler, and Martin Jüttner. 2011. Peripheral vision and pattern recognition: A review. J. Vis. 11, 5 (2011), 13.Google ScholarGoogle ScholarCross RefCross Ref
  33. Y. Su and K. Grauman. 2017. Making 360° video watchable in 2D: Learning videography for click free viewing. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1368--1376.Google ScholarGoogle Scholar
  34. Yu-Chuan Su, Dinesh Jayaraman, and Kristen Grauman. 2016. Pano2Vid: Automatic cinematography for watching 360° videos. In Proceedings of the 2016 Asian Conference on Computer Vision (ACCV’16).Google ScholarGoogle Scholar
  35. Qi Sun, Li-Yi Wei, and Arie Kaufman. 2016. Mapping virtual and physical reality. ACM Trans. Graph. 35, 4, Article 64 (Jul. 2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Bill Triggs, Philip F. McLauchlan, Richard I. Hartley, and Andrew W. Fitzgibbon. 2000. Bundle adjustment - A modern synthesis. In Proceedings of the 2000 International Workshop on Vision Algorithms: Theory and Practice. 298--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yu-Shuen Wang, Hongbo Fu, Olga Sorkine, Tong-Yee Lee, and Hans-Peter Seidel. 2009. Motion-aware temporal coherence for video resizing. ACM Trans. Graph. 28, 5, Article 127 (Dec. 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Joint Stabilization and Direction of 360° Videos

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 38, Issue 2
        April 2019
        112 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3313807
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 March 2019
        • Accepted: 1 January 2019
        • Revised: 1 October 2018
        • Received: 1 January 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!