skip to main content
research-article

Tools for placing cuts and transitions in interview video

Published:01 July 2012Publication History
Skip Abstract Section

Abstract

We present a set of tools designed to help editors place cuts and create transitions in interview video. To help place cuts, our interface links a text transcript of the video to the corresponding locations in the raw footage. It also visualizes the suitability of cut locations by analyzing the audio/visual features of the raw footage to find frames where the speaker is relatively quiet and still. With these tools editors can directly highlight segments of text, check if the endpoints are suitable cut locations and if so, simply delete the text to make the edit. For each cut our system generates visible (e.g. jump-cut, fade, etc.) and seamless, hidden transitions. We present a hierarchical, graph-based algorithm for efficiently generating hidden transitions that considers visual features specific to interview footage. We also describe a new data-driven technique for setting the timing of the hidden transition. Finally, our tools offer a one click method for seamlessly removing 'ums' and repeated words as well as inserting natural-looking pauses to emphasize semantic content. We apply our tools to edit a variety of interviews and also show how they can be used to quickly compose multiple takes of an actor narrating a story.

Skip Supplemental Material Section

Supplemental Material

tp168_12.mp4

References

  1. Abel, J., and Glass, I. 1999. Radio: An illustrated guide. WBEZ Alliance Inc.Google ScholarGoogle Scholar
  2. Agarwala, A., Zheng, K., Pal, C., Agrawala, M., Cohen, M., Curless, B., Salesin, D., and Szeliski, R. 2005. Panoramic video textures. Proc. SIGGRAPH 24, 3, 821--827. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arya, S., Mount, D., Netanyahu, N., Silverman, R., and Wu, A. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of ACM 45, 6, 891--923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Athitsos, V., Alon, J., Sclaroff, S., and Kollios, G. 2004. BoostMap: A method for efficient approximate similarity rankings. Proc. CVPR, II:268--II:275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH, 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Boreczky, J., and Rowe, L. 1996. Comparison of video shot boundary detection techniques. JEI 5, 2, 122--128.Google ScholarGoogle ScholarCross RefCross Ref
  7. Bregler, C., and Omohundro, S. 1995. Nonlinear manifold learning for visual speech recognition. Proc. ICCV, 494--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: Driving visual speech with audio. In Proc. SIGGRAPH, 353--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brox, T., Bruhn, A., Papenberg, N., and Weickert, J. 2004. High accuracy optical flow estimation based on a theory for warping. Proc. ECCV, 25--36.Google ScholarGoogle Scholar
  10. Casares, J., Long, A., Myers, B., Bhatnagar, R., Stevens, S., Dabbish, L., Yocum, D., and Corbett, A. 2002. Simplifying video editing using metadata. In Proc. DIS, 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In Proc. CVPR, 886--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dale, K., Sunkavalli, K., Johnson, M., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. Proc. SIGGRAPH ASIA 30, 6, 130:1--130:10. Google ScholarGoogle Scholar
  13. Dragicevic, P., Ramos, G., Bibliowitcz, J., Nowrouzezahrai, D., Balakrishnan, R., and Singh, K. 2008. Video browsing by direct manipulation. Proc. CHI, 237--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fowlkes, C., Belongie, S., Chung, F., and Malik, J. 2004. Spectral grouping using the nystrom method. PAMI 26, 2, 214--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Girgensohn, A., Boreczky, J., Chiu, P., Doherty, J., Foote, J., Golovchinsky, G., Uchihashi, S., and Wilcox, L. 2000. A semi-automatic approach to home video editing. Proc. UIST, 81--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Goldman, D., Gonterman, C., Curless, B., Salesin, D., and Seitz, S. 2008. Video object annotation, navigation, and composition. Proc. UIST, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gomes, J. 1999. Warping and morphing of graphical objects, vol. 1. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Karrer, T., Weiss, M., Lee, E., and Borchers, J. 2008. DRAGON: A direct manipulation interface for frame-accurate in-scene video navigation. Proc. CHI, 247--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. 2010. Being John Malkovich. Proc. ECCV, 341--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kemelmacher-Shlizerman, I., Shechtman, E., Garg, R., and Seitz, S. 2011. Exploring photobios. ACM Trans. on Graph. (Proc. SIGGRAPH) 30, 4, 61:1--61:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kwatra, V., Schodl, A., Essa, I., Turk, G., and Bobick, A. 2003. Graphcut textures: Image and video synthesis using graph cuts. Proc. SIGGRAPH 22, 3, 277--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mahajan, D., Huang, F., Matusik, W., Ramamoorthi, R., and Belhumeur, P. 2009. Moving gradients: A path-based method for plausible image interpolation. Proc. SIGGRAPH 28, 3, 42:1--42:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O'Steen, B. 2009. The Invisible Cut: How Editors Make Movie Magic. Michael Wiese Productions.Google ScholarGoogle Scholar
  24. Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., and Salesin, D. 1998. Synthesizing realistic facial expressions from photographs. Proc. SIGGRAPH, 75--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Potamianos, G., Neti, C., Gravier, G., Garg, A., and Senior, A. 2003. Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91, 9, 1306--1326.Google ScholarGoogle Scholar
  26. Ranjan, A., Birnholtz, J., and Balakrishnan, R. 2008. Improving meeting capture by applying television production principles with audio and motion detection. In Proc. CHI, ACM, 227--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Saragih, J., Lucey, S., and Cohn, J. 2009. Face alignment through subspace constrained mean-shifts. ICCV, 1034--1041.Google ScholarGoogle Scholar
  28. Schödl, A., and Essa, I. 2002. Controlled animation of video sprites. In Proc. SCA, 121--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Schödl, A., Szeliski, R., Salesin, D., and Essa, I. 2000. Video textures. Proc. SIGGRAPH, 489--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shechtman, E., Rav-Acha, A., Irani, M., and Seitz, S. 2010. Regenerative morphing. Proc. CVPR, 615--622.Google ScholarGoogle Scholar
  31. Truong, B., and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM TOMCCAP 3, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ueda, H., Miyatake, T., and Yoshizawa, S. 1991. IMPACT: An interactive natural-motion-picture dedicated multimedia authoring system. Proc. CHI, 343--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Virage. Audio analysis. http://www.virage.com/.Google ScholarGoogle Scholar
  34. Wexler, Y., Shechtman, E., and Irani, M. 2007. Space-time completion of video. PAMI 29, 3, 463--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhang, H., Low, C., Smoliar, S., and Wu, J. 1995. Video parsing, retrieval and browsing: an integrated and content-based solution. Proc. Multimedia, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhang, L., Snavely, N., Curless, B., and Seitz, S. 2004. Spacetime faces: High resolution capture for modeling and animation. Proc. SIGGRAPH, 548--558. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tools for placing cuts and transitions in interview video

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 31, Issue 4
        July 2012
        935 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/2185520
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 July 2012
        Published in tog Volume 31, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader