Abstract
We present a visual analogue for musical rhythm derived from an analysis of motion in video, and show that alignment of visual rhythm with its musical counterpart results in the appearance of dance. Central to our work is the concept of visual beats --- patterns of motion that can be shifted in time to control visual rhythm. By warping visual beats into alignment with musical beats, we can create or manipulate the appearance of dance in video. Using this approach we demonstrate a variety of retargeting applications that control musical synchronization of audio and video: we can change what song performers are dancing to, warp irregular motion into alignment with music so that it appears to be dancing, or search collections of video for moments of accidentally dance-like motion that can be used to synthesize musical performances.
Supplemental Material
- Jiamin Bai, Aseem Agarwala, Maneesh Agrawala, and Ravi Ramamoorthi. 2012. Selectively De-Animating Video. ACM Transactions on Graphics (2012). http://graphics.berkeley.edu/papers/Bai-SDV-2012-08/ Google Scholar
Digital Library
- Jean Charles Bazin and Alexander Sorkine-Hornung. 2016. ActionSnapping: Motion-Based Video Synchronization. In ECCV.Google Scholar
- Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2012. Tools for Placing Cuts and Transitions in Interview Video. ACM Trans. Graph. 31, 4, Article 67 (July 2012), 8 pages. Google Scholar
Digital Library
- Sebastian Böck and Gerhard Widmer. 2013. Maximum Filter Vibrato Suppression for Onset Detection.Google Scholar
- Thaddeus L. Bolton. 1894. Rhythm. The American Journal of Psychology 6, 2 (1894), 145--238. http://www.jstor.org/stable/1410948Google Scholar
Cross Ref
- Timothy R. Brick and Steven M. Boker. 2011. Correlational Methods for Analysis of Dance Movements. Dance Research 29, supplement (2011), 283--304.Google Scholar
Cross Ref
- Kevin Burg and Jamie Beck. 2012. Cinemagraphs. (2012). http://cinemagraphs.com/Google Scholar
- M. Chion, C. Gorbman, and W. Murch. 1994. Audio-vision: Sound on Screen. Columbia University Press. https://books.google.com/books?id=BBs4Arfm98oCGoogle Scholar
- Yung-Yu Chuang, Dan B Goldman, Ke Colin Zheng, Brian Curless, David H. Salesin, and Richard Szeliski. 2005. Animating Pictures with Stochastic Motion Textures. ACM Trans. Graph. 24, 3 (July 2005), 853--860. Google Scholar
Digital Library
- Hyun chul Lee and In kwon Lee. 2005. Automatic Synchronization of Background Music and Motion. In in Computer Animation,âĂİ in Computer Graphics Forum, Volume 24, Issue 3 (2005. 353--362.Google Scholar
- Laura K. Cirelli, Christina Spinelli, Sylvie Nozaradan, and Laurel J. Trainor. 2016. Measuring Neural Entrainment to Beat and Meter in Infants: Effects of Music Background. Frontiers in Neuroscience 10 (2016), 229.Google Scholar
Cross Ref
- H. Cowell and D. Nicholls. 1996. New Musical Resources. Cambridge University Press. https://books.google.com/books?id=BeLDXA-7TdACGoogle Scholar
- Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Oral Buyukozturk, Fredo Durand, and William T. Freeman. 2017. Visual Vibrometry: Estimating Material Properties from Small Motions in Video. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (April 2017), 732--745. Google Scholar
Digital Library
- Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Fredo Durand, and William T. Freeman. 2015a. Visual Vibrometry: Estimating Material Properties From Small Motion in Video. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Abe Davis, Justin G. Chen, and Frédo Durand. 2015b. Image-space Modal Bases for Plausible Manipulation of Objects in Video. ACM Trans. Graph. 34, 6, Article 239 (Oct. 2015), 7 pages. Google Scholar
Digital Library
- Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham J. Mysore, Frédo Durand, and William T. Freeman. 2014. The Visual Microphone: Passive Recovery of Sound from Video. ACM Trans. Graph. 33, 4, Article 79 (July 2014), 10 pages. Google Scholar
Digital Library
- Simon Dixon. 2006. Onset detection revisited. In In Proceedings of the 9th international conference on digital audio effects. 133--137.Google Scholar
- V. Dyaberi, H. Sundaram, T. Rikakis, and J. James. 2006. The Computational Extraction of Spatio-Temporal Formal Structures in the Interactive Dance Work `22'. In 2006 Fortieth Asilomar Conference on Signals, Systems and Computers. 59--63.Google Scholar
- Daniel P. W. Ellis. 2007. Beat Tracking by Dynamic Programming. Journal of New Music Research 36, 1 (2007), 51--60.Google Scholar
Cross Ref
- Masataka Goto. 2002. An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds. 30 (09 2002).Google Scholar
- P. Grosche, M. Muller, and F. Kurth. 2010. Cyclic tempogram - A mid-level tempo representation for music signals. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. 5522--5525.Google Scholar
- Xiao Hu, Jin Ha Lee, David Bainbridge, Kahyun Choi, Peter Organisciak, and J. Stephen Downie. 2017. The MIREX Grand Challenge: A Framework of Holistic User-experience Evaluation in Music Information Retrieval. J. Assoc. Inf. Sci. Technol. 68, 1 (Jan. 2017), 97--112. Google Scholar
Digital Library
- Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning to Predict Where Humans Look. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
Cross Ref
- Tae-hoon Kim, Sang Il Park, and Sung Yong Shin. 2003. Rhythmic-motion Synthesis Based on Motion-beat Analysis. ACM Trans. Graph. 22, 3 (July 2003), 392--401. Google Scholar
Digital Library
- Timothy R. Langlois and Doug L. James. 2014. Inverse-Foley Animation: Synchronizing rigid-body motions to sound. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2014) 33, 4 (Aug. 2014). Google Scholar
Digital Library
- Mackenzie Leake, Abe Davis, Anh Truong, and Maneesh Agrawala. 2017. Computational Video Editing for Dialogue-driven Scenes. ACM Trans. Graph. 36, 4, Article 130 (July 2017), 14 pages. Google Scholar
Digital Library
- Alexander Lerch. 2012. An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics (1st ed.). Wiley-IEEE Press. Google Scholar
Digital Library
- Zicheng Liao, Yizhou Yu, Bingchen Gong, and Lechao Cheng. 2015. audeosynth: Music-Driven Video Montage. ACM Trans. Graph. (SIGGRAPH) 34, 4 (2015). Google Scholar
Digital Library
- Feng Liu, Yuzhen Niu, and Michael Gleicher. 2009. Using Web Photos for Measuring Video Frame Interestingness. In Proceedings of the 21st International Jont Conference on Artifical Intelligence (IJCAI'09). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2058--2063. http://dl.acm.org/citation.cfm?id=1661445.1661774 Google Scholar
Digital Library
- LumBeat. 2013. 60 BPM Metronome. (Feb 2013). https://www.youtube.com/watch?v=gsJEMH_emBMGoogle Scholar
- Brian McFee, Matt McVicar, Oriol Nieto, Stefan Balke, Carl Thome, Dawen Liang, Eric Battenberg, Josh Moore, Rachel Bittner, Ryuichi Yamamoto, Dan Ellis, Fabian-Robert Stoter, Douglas Repetto, Simon Waloschek, CJ Carr, Seth Kranzler, Keunwoo Choi, Petr Viktorin, Joao Felipe Santos, Adrian Holovaty, Waldir Pimenta, and Hojin Lee. 2017. librosa 0.5.0. (Feb. 2017).Google Scholar
- Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and Music Signal Analysis in Python.Google Scholar
- K. McPherson. 2006. Making Video Dance: A Step-by-step Guide to Creating Dance for the Screen. Routledge. https://books.google.com/books?id=b3hVewAACAAJGoogle Scholar
- Trista P. Chen, Ching-Wei Chen, Phillip Popp, and Bob Coover. 2011. Visual Rhythm Detection and Its Applications in Interactive Multimedia. 18 (01 2011), 88--95. Google Scholar
Digital Library
- Aniruddh D. Patel and Steven M. Demorest. 2013. 16 - Comparative Music Cognition: Cross-Species and Cross-Cultural Studies. In The Psychology of Music (Third Edition) (third edition ed.), Diana Deutsch (Ed.). Academic Press, 647 -- 681.Google Scholar
- Aniruddh D. Patel, John R. Iversen, Micah R. Bregman, and Irena Schulz, {n. d.}. Experimental Evidence for Synchronization to a Musical Beat in a Nonhuman Animal. Current Biology 19, 10 (2017/11/14 {n. d.}), 827--830.Google Scholar
- L. C. Pickup, Z. Pan, D. Wei, Y. Shih, C. Zhang, A. Zisserman, B. Schölkopf, and W. T. Freeman. 2014. Seeing the Arrow of Time. In IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Y. Pritch, A. Rav-Acha, and S. Peleg. 2008. Nonchronological Video Synopsis and Indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 11 (Nov 2008), 1971--1984. Google Scholar
Digital Library
- Bruno H. Repp and Yi-Huang Su. 2013. Sensorimotor synchronization: A review of recent research (2006--2012). Psychonomic Bulletin & Review 20, 3 (01 Jun 2013), 403--452.Google Scholar
Cross Ref
- Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. 2000. Video Textures. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '00). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 489--498. Google Scholar
Digital Library
- C. Turk. 2002. Effective Speaking: Communicating in Speech. Taylor & Francis, https://books.google.com/books?id=afiTAgAAQBAJGoogle Scholar
Cross Ref
- Ubisoft. 2013. Just Dance Kids 2 I Am A Gummy Bear. (May 2013). https://www.youtube.com/watch?v=ITbZosS4dX3gGoogle Scholar
- C. Vernallis. 2004. Experiencing Music Video: Aesthetics and Cultural Context. Columbia University Press. https://books.google.com/books?id=DjDIw2pxjiMCGoogle Scholar
- Jue Wang, Steven M. Drucker, Maneesh Agrawala, and Michael F. Cohen. 2006. The Cartoon Animation Filter. ACM Trans. Graph. 25, 3 (July 2006), 1169--1173. Google Scholar
Digital Library
- Oliver Wang, Christopher Schroers, Henning Zimmer, Markus Gross, and Alexander Sorkine-Hornung. 2014. VideoSnapping: Interactive Synchronization of Multiple Videos. ACM Trans. Graph. 33, 4, Article 77 (July 2014), 10 pages. Google Scholar
Digital Library
- Shen-Zheng Wang, Yung-Sheng Chen, Shih-Hung Lee, and C.-C. Jay Kuo. 2008. Visual Tempo Analysis for MTV-Style Home Video Authoring. In Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 2 - Volume 02 (CISP '08). IEEE Computer Society, Washington, DC, USA, 192--196. Google Scholar
Digital Library
- David White, Kevin Loken, and Michiel van de Panne. 2006. Slow in and Slow out Cartoon Animation Filter. In ACM SIGGRAPH 2006 Research Posters (SIGGRAPH '06). ACM, New York, NY, USA, Article 3. Google Scholar
Digital Library
- Andrew Witkin and Zoran Popovic. 1995. Motion Warping. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '95). ACM, New York, NY, USA, 105--108. Google Scholar
Digital Library
- WSJDigitalNetwork. 2012. Best Moments of First Obama/Romney Debate. (Oct 2012). https://www.youtube.com/watch?v=QQC0nz0t9F4Google Scholar
- YouTube:shubhgupta91. 2015. Turtle dancing at Satisfaction HD. (May 2015). https://www.youtube.com/watch?v=YE6_WbI0YLkGoogle Scholar
- Jean yves Bouguet. 2000. Pyramidal implementation of the Lucas Kanade feature tracker. Intel Corporation, Microprocessor Research Labs (2000).Google Scholar
- Zumba with Layryn. 2014. "Danza Kuduro" Zumba Routine. (Jun 2014). https://www.youtube.com/watch?v=gH20VFWEMdMGoogle Scholar
Index Terms
Visual rhythm and beat
Recommendations
Self-supervised Dance Video Synthesis Conditioned on Music
MM '20: Proceedings of the 28th ACM International Conference on MultimediaWe present a self-supervised approach with pose perceptual loss for automatic dance video generation. Our method can produce a realistic dance video that conforms to the beats and rhymes of given music. To achieve this, we firstly generate a human ...
Interactive music 3.0: empowering people to participate musically inside nightclubs
CMMR'11: Proceedings of the 8th international conference on Speech, Sound and Music Processing: embracing research in IndiaNightclubs are powerhouses in western culture for social listening and dancing to music. Here, mostly digital, pre-composed tunes are selected, mixed and played by a person called Disc Jockey. In another digital arena, the internet, a revolution is ...
Synchronization and fluctuation of rhythm in musical cooperative performance
HCII'11: Proceedings of the 14th international conference on Human-computer interaction: design and development approaches - Volume Part IA live musical performance gives us better impression than recorded music heard from a portable music player. From player's point of view, live performance also gives better impression than playing music with metronome or recorded music. Thus, the ...





Comments