skip to main content
research-article

Visual rhythm and beat

Published:30 July 2018Publication History
Skip Abstract Section

Abstract

We present a visual analogue for musical rhythm derived from an analysis of motion in video, and show that alignment of visual rhythm with its musical counterpart results in the appearance of dance. Central to our work is the concept of visual beats --- patterns of motion that can be shifted in time to control visual rhythm. By warping visual beats into alignment with musical beats, we can create or manipulate the appearance of dance in video. Using this approach we demonstrate a variety of retargeting applications that control musical synchronization of audio and video: we can change what song performers are dancing to, warp irregular motion into alignment with music so that it appears to be dancing, or search collections of video for moments of accidentally dance-like motion that can be used to synthesize musical performances.

Skip Supplemental Material Section

Supplemental Material

a122-davis.mp4

References

  1. Jiamin Bai, Aseem Agarwala, Maneesh Agrawala, and Ravi Ramamoorthi. 2012. Selectively De-Animating Video. ACM Transactions on Graphics (2012). http://graphics.berkeley.edu/papers/Bai-SDV-2012-08/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jean Charles Bazin and Alexander Sorkine-Hornung. 2016. ActionSnapping: Motion-Based Video Synchronization. In ECCV.Google ScholarGoogle Scholar
  3. Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2012. Tools for Placing Cuts and Transitions in Interview Video. ACM Trans. Graph. 31, 4, Article 67 (July 2012), 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sebastian Böck and Gerhard Widmer. 2013. Maximum Filter Vibrato Suppression for Onset Detection.Google ScholarGoogle Scholar
  5. Thaddeus L. Bolton. 1894. Rhythm. The American Journal of Psychology 6, 2 (1894), 145--238. http://www.jstor.org/stable/1410948Google ScholarGoogle ScholarCross RefCross Ref
  6. Timothy R. Brick and Steven M. Boker. 2011. Correlational Methods for Analysis of Dance Movements. Dance Research 29, supplement (2011), 283--304.Google ScholarGoogle ScholarCross RefCross Ref
  7. Kevin Burg and Jamie Beck. 2012. Cinemagraphs. (2012). http://cinemagraphs.com/Google ScholarGoogle Scholar
  8. M. Chion, C. Gorbman, and W. Murch. 1994. Audio-vision: Sound on Screen. Columbia University Press. https://books.google.com/books?id=BBs4Arfm98oCGoogle ScholarGoogle Scholar
  9. Yung-Yu Chuang, Dan B Goldman, Ke Colin Zheng, Brian Curless, David H. Salesin, and Richard Szeliski. 2005. Animating Pictures with Stochastic Motion Textures. ACM Trans. Graph. 24, 3 (July 2005), 853--860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hyun chul Lee and In kwon Lee. 2005. Automatic Synchronization of Background Music and Motion. In in Computer Animation,âĂİ in Computer Graphics Forum, Volume 24, Issue 3 (2005. 353--362.Google ScholarGoogle Scholar
  11. Laura K. Cirelli, Christina Spinelli, Sylvie Nozaradan, and Laurel J. Trainor. 2016. Measuring Neural Entrainment to Beat and Meter in Infants: Effects of Music Background. Frontiers in Neuroscience 10 (2016), 229.Google ScholarGoogle ScholarCross RefCross Ref
  12. H. Cowell and D. Nicholls. 1996. New Musical Resources. Cambridge University Press. https://books.google.com/books?id=BeLDXA-7TdACGoogle ScholarGoogle Scholar
  13. Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Oral Buyukozturk, Fredo Durand, and William T. Freeman. 2017. Visual Vibrometry: Estimating Material Properties from Small Motions in Video. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (April 2017), 732--745. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Fredo Durand, and William T. Freeman. 2015a. Visual Vibrometry: Estimating Material Properties From Small Motion in Video. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  15. Abe Davis, Justin G. Chen, and Frédo Durand. 2015b. Image-space Modal Bases for Plausible Manipulation of Objects in Video. ACM Trans. Graph. 34, 6, Article 239 (Oct. 2015), 7 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham J. Mysore, Frédo Durand, and William T. Freeman. 2014. The Visual Microphone: Passive Recovery of Sound from Video. ACM Trans. Graph. 33, 4, Article 79 (July 2014), 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Simon Dixon. 2006. Onset detection revisited. In In Proceedings of the 9th international conference on digital audio effects. 133--137.Google ScholarGoogle Scholar
  18. V. Dyaberi, H. Sundaram, T. Rikakis, and J. James. 2006. The Computational Extraction of Spatio-Temporal Formal Structures in the Interactive Dance Work `22'. In 2006 Fortieth Asilomar Conference on Signals, Systems and Computers. 59--63.Google ScholarGoogle Scholar
  19. Daniel P. W. Ellis. 2007. Beat Tracking by Dynamic Programming. Journal of New Music Research 36, 1 (2007), 51--60.Google ScholarGoogle ScholarCross RefCross Ref
  20. Masataka Goto. 2002. An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds. 30 (09 2002).Google ScholarGoogle Scholar
  21. P. Grosche, M. Muller, and F. Kurth. 2010. Cyclic tempogram - A mid-level tempo representation for music signals. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. 5522--5525.Google ScholarGoogle Scholar
  22. Xiao Hu, Jin Ha Lee, David Bainbridge, Kahyun Choi, Peter Organisciak, and J. Stephen Downie. 2017. The MIREX Grand Challenge: A Framework of Holistic User-experience Evaluation in Music Information Retrieval. J. Assoc. Inf. Sci. Technol. 68, 1 (Jan. 2017), 97--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning to Predict Where Humans Look. In IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  24. Tae-hoon Kim, Sang Il Park, and Sung Yong Shin. 2003. Rhythmic-motion Synthesis Based on Motion-beat Analysis. ACM Trans. Graph. 22, 3 (July 2003), 392--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Timothy R. Langlois and Doug L. James. 2014. Inverse-Foley Animation: Synchronizing rigid-body motions to sound. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2014) 33, 4 (Aug. 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mackenzie Leake, Abe Davis, Anh Truong, and Maneesh Agrawala. 2017. Computational Video Editing for Dialogue-driven Scenes. ACM Trans. Graph. 36, 4, Article 130 (July 2017), 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Alexander Lerch. 2012. An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics (1st ed.). Wiley-IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zicheng Liao, Yizhou Yu, Bingchen Gong, and Lechao Cheng. 2015. audeosynth: Music-Driven Video Montage. ACM Trans. Graph. (SIGGRAPH) 34, 4 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Feng Liu, Yuzhen Niu, and Michael Gleicher. 2009. Using Web Photos for Measuring Video Frame Interestingness. In Proceedings of the 21st International Jont Conference on Artifical Intelligence (IJCAI'09). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2058--2063. http://dl.acm.org/citation.cfm?id=1661445.1661774 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. LumBeat. 2013. 60 BPM Metronome. (Feb 2013). https://www.youtube.com/watch?v=gsJEMH_emBMGoogle ScholarGoogle Scholar
  31. Brian McFee, Matt McVicar, Oriol Nieto, Stefan Balke, Carl Thome, Dawen Liang, Eric Battenberg, Josh Moore, Rachel Bittner, Ryuichi Yamamoto, Dan Ellis, Fabian-Robert Stoter, Douglas Repetto, Simon Waloschek, CJ Carr, Seth Kranzler, Keunwoo Choi, Petr Viktorin, Joao Felipe Santos, Adrian Holovaty, Waldir Pimenta, and Hojin Lee. 2017. librosa 0.5.0. (Feb. 2017).Google ScholarGoogle Scholar
  32. Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and Music Signal Analysis in Python.Google ScholarGoogle Scholar
  33. K. McPherson. 2006. Making Video Dance: A Step-by-step Guide to Creating Dance for the Screen. Routledge. https://books.google.com/books?id=b3hVewAACAAJGoogle ScholarGoogle Scholar
  34. Trista P. Chen, Ching-Wei Chen, Phillip Popp, and Bob Coover. 2011. Visual Rhythm Detection and Its Applications in Interactive Multimedia. 18 (01 2011), 88--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Aniruddh D. Patel and Steven M. Demorest. 2013. 16 - Comparative Music Cognition: Cross-Species and Cross-Cultural Studies. In The Psychology of Music (Third Edition) (third edition ed.), Diana Deutsch (Ed.). Academic Press, 647 -- 681.Google ScholarGoogle Scholar
  36. Aniruddh D. Patel, John R. Iversen, Micah R. Bregman, and Irena Schulz, {n. d.}. Experimental Evidence for Synchronization to a Musical Beat in a Nonhuman Animal. Current Biology 19, 10 (2017/11/14 {n. d.}), 827--830.Google ScholarGoogle Scholar
  37. L. C. Pickup, Z. Pan, D. Wei, Y. Shih, C. Zhang, A. Zisserman, B. Schölkopf, and W. T. Freeman. 2014. Seeing the Arrow of Time. In IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Y. Pritch, A. Rav-Acha, and S. Peleg. 2008. Nonchronological Video Synopsis and Indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 11 (Nov 2008), 1971--1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Bruno H. Repp and Yi-Huang Su. 2013. Sensorimotor synchronization: A review of recent research (2006--2012). Psychonomic Bulletin & Review 20, 3 (01 Jun 2013), 403--452.Google ScholarGoogle ScholarCross RefCross Ref
  40. Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. 2000. Video Textures. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '00). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 489--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Turk. 2002. Effective Speaking: Communicating in Speech. Taylor & Francis, https://books.google.com/books?id=afiTAgAAQBAJGoogle ScholarGoogle ScholarCross RefCross Ref
  42. Ubisoft. 2013. Just Dance Kids 2 I Am A Gummy Bear. (May 2013). https://www.youtube.com/watch?v=ITbZosS4dX3gGoogle ScholarGoogle Scholar
  43. C. Vernallis. 2004. Experiencing Music Video: Aesthetics and Cultural Context. Columbia University Press. https://books.google.com/books?id=DjDIw2pxjiMCGoogle ScholarGoogle Scholar
  44. Jue Wang, Steven M. Drucker, Maneesh Agrawala, and Michael F. Cohen. 2006. The Cartoon Animation Filter. ACM Trans. Graph. 25, 3 (July 2006), 1169--1173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Oliver Wang, Christopher Schroers, Henning Zimmer, Markus Gross, and Alexander Sorkine-Hornung. 2014. VideoSnapping: Interactive Synchronization of Multiple Videos. ACM Trans. Graph. 33, 4, Article 77 (July 2014), 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Shen-Zheng Wang, Yung-Sheng Chen, Shih-Hung Lee, and C.-C. Jay Kuo. 2008. Visual Tempo Analysis for MTV-Style Home Video Authoring. In Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 2 - Volume 02 (CISP '08). IEEE Computer Society, Washington, DC, USA, 192--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. David White, Kevin Loken, and Michiel van de Panne. 2006. Slow in and Slow out Cartoon Animation Filter. In ACM SIGGRAPH 2006 Research Posters (SIGGRAPH '06). ACM, New York, NY, USA, Article 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Andrew Witkin and Zoran Popovic. 1995. Motion Warping. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '95). ACM, New York, NY, USA, 105--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. WSJDigitalNetwork. 2012. Best Moments of First Obama/Romney Debate. (Oct 2012). https://www.youtube.com/watch?v=QQC0nz0t9F4Google ScholarGoogle Scholar
  50. YouTube:shubhgupta91. 2015. Turtle dancing at Satisfaction HD. (May 2015). https://www.youtube.com/watch?v=YE6_WbI0YLkGoogle ScholarGoogle Scholar
  51. Jean yves Bouguet. 2000. Pyramidal implementation of the Lucas Kanade feature tracker. Intel Corporation, Microprocessor Research Labs (2000).Google ScholarGoogle Scholar
  52. Zumba with Layryn. 2014. "Danza Kuduro" Zumba Routine. (Jun 2014). https://www.youtube.com/watch?v=gH20VFWEMdMGoogle ScholarGoogle Scholar

Index Terms

  1. Visual rhythm and beat

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Graphics
              ACM Transactions on Graphics  Volume 37, Issue 4
              August 2018
              1670 pages
              ISSN:0730-0301
              EISSN:1557-7368
              DOI:10.1145/3197517
              Issue’s Table of Contents

              Copyright © 2018 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 30 July 2018
              Published in tog Volume 37, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader