Abstract
Controlling the manner in which a character moves in a real-time animation system is a challenging task with useful applications. Existing style transfer systems require access to a reference content motion clip, however, in real-time systems the future motion content is unknown and liable to change with user input. In this work we present a style modelling system that uses an animation synthesis network to model motion content based on local motion phases. An additional style modulation network uses feature-wise transformations to modulate style in real-time. To evaluate our method, we create and release a new style modelling dataset, 100STYLE, containing over 4 million frames of stylised locomotion data in 100 different styles that present a number of challenges for existing systems. To model these styles, we extend the local phase calculation with a contact-free formulation. In comparison to other methods for real-time style modelling, we show our system is more robust and efficient in its style representation while improving motion quality.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Real-Time Style Modelling of Human Locomotion via Feature-Wise Transformations and Local Motion Phases
- Kfir Aberman, Peizhuo Li, Dani Lischinski, Olga Sorkine-Hornung, Daniel Cohen-Or, and Baoquan Chen. 2020a. Skeleton-aware networks for deep motion retargeting. ACM Transactions on Graphics (TOG) 39, 4 (2020), 62--1.Google Scholar
Digital Library
- Kfir Aberman, Yijia Weng, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2020b. Unpaired motion style transfer from video to animation. ACM Transactions on Graphics (TOG) 39, 4 (2020), 64.Google Scholar
Digital Library
- Amir Atapour-Abarghouei and Toby P Breckon. 2018. Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2800--2810.Google Scholar
Cross Ref
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).Google Scholar
- Alex Bird and Christopher K I Williams. 2019. Customizing sequence generation with multi-task dynamical systems. arXiv preprint arXiv:1910.05026 (2019).Google Scholar
- Judith Bütepage, Michael Black, Danica Kragic, and Hedvig Kjellström. 2017. Deep representation learning for human motion prediction and classification. arXiv preprint arXiv:1702.07486 (2017).Google Scholar
- Simon Clavet. 2016. Motion matching and the road to next-gen animation. In GDC.Google Scholar
- Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2015. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015).Google Scholar
- Yuzhu Dong, Andreas Aristidou, Ariel Shamir, Moshe Mahler, and Eakta Jain. 2020. Adult2child: Motion style transfer using CycleGANs. In Motion, Interaction and Games (MIG '20). Association for Computing Machinery. https://doi.org/10.1145/3424636.3426909Google Scholar
Digital Library
- Han Du, Erik Herrmann, Janis Sprenger, Klaus Fischer, and Philipp Slusallek. 2019. Stylistic locomotion modeling and synthesis using variational generative models. In Motion, Interaction and Games (MIG '19). Association for Computing Machinery. https://doi.org/10.1145/3359566.3360083Google Scholar
Digital Library
- Vincent Dumoulin, Ethan Perez, Nathan Schucher, Florian Strub, Harm de Vries, Aaron Courville, and Yoshua Bengio. 2018. Feature-wise transformations. Distill (2018). https://doi.org/10.23915/distill.00011Google Scholar
- Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. 2017. A learned representation for artistic style. In ICLR.Google Scholar
- Cian Eastwood, Ian Mason, Christopher K.I. Williams, and Bernhard Schölkopf. 2021. Source-free adaptation to measurement shift via bottom-up feature restoration. arXiv preprint arXiv:2107.05446 (2021).Google Scholar
- Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision. 4346--4354.Google Scholar
Digital Library
- Levi Fussell, Kevin Bergamin, and Daniel Holden. 2021. SuperTrack: Motion tracking for physically simulated characters using supervised learning. ACM Transactions on Graphics 40 (12 2021), 1--13. https://doi.org/10.1145/3478513.3480527Google Scholar
Digital Library
- Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2015. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015).Google Scholar
- Ralph Gross and Jianbo Shi. 2001. The CMU motion of body (MoBo) database. Technical Report.Google Scholar
- David Ha and Jürgen Schmidhuber. 2018. World models. arXiv preprint arXiv:1803.10122 (2018).Google Scholar
- Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. 2020. MoGlow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics 39, 4 (2020), 236:1-236:14. https://doi.org/10.1145/3414685.3417836Google Scholar
Digital Library
- Daniel Holden, Ikhsanul Habibie, Ikuo Kusajima, and Taku Komura. 2017a. Fast neural style transfer for motion data. IEEE computer graphics and applications 37, 4 (2017), 42--49.Google Scholar
Digital Library
- Daniel Holden, Oussama Kanoun, Maksym Perepichka, and Tiberiu Popa. 2020. Learned motion matching. ACM Transactions on Graphics (TOG) 39, 4 (2020), 53--1.Google Scholar
Digital Library
- Daniel Holden, Taku Komura, and Jun Saito. 2017b. Phase-functioned neural networks for character control. ACM Transactions on Graphics (TOG) 36, 4 (2017), 42.Google Scholar
Digital Library
- Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. 35, 4 (2016), 138:1--138:11.Google Scholar
Digital Library
- Yunzhong Hou and Liang Zheng. 2020. Source free domain adaptation with image translation. arXiv preprint arXiv:2008.07514 (2020).Google Scholar
- Eugene Hsu, Kari Pulli, and Jovan Popović. 2005. Style translation for human motion. In ACM Transactions on Graphics (TOG), Vol. 24. ACM, 1082--1089.Google Scholar
Digital Library
- Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Computer Vision (ICCV), 2017 IEEE International Conference on.Google Scholar
Cross Ref
- Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. arXiv preprint arXiv:1804.04732 (2018).Google Scholar
- Leslie Ikemoto, Okan Arikan, and David Forsyth. 2009. Generalizing motion edits with Gaussian processes. ACM Trans. Graph. 28, 1 (Feb. 2009). https://doi.org/10.1145/1477926.1477927Google Scholar
Digital Library
- Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (Jul 2014), 1325--1339.Google Scholar
Digital Library
- Xu Ji, Razvan Pascanu, Devon Hjelm, Balaji Lakshminarayanan, and Andrea Vedaldi. 2021. Test sample accuracy scales with training sample density in neural networks. arXiv preprint arXiv:2106.08365 (2021).Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Lucas Kovar, Michael Gleicher, and Frédéric Pighin. 2002. Motion Graphs. ACM Trans. Graph. 21, 3 (July 2002). https://doi.org/10.1145/566654.566605Google Scholar
Digital Library
- Kyungho Lee, Seyoung Lee, and Jehee Lee. 2018. Interactive character animation by learning multi-objective control. ACM Trans. Graph. 37, 6 (Dec. 2018). https://doi.org/10.1145/3272127.3275071Google Scholar
Digital Library
- Zimo Li, Yi Zhou, Shuangjiu Xiao, Chong He, and Hao Li. 2018. Auto-conditioned LSTM network for extended complex human motion synthesis. In ICLR.Google Scholar
- Hung Yu Ling, Fabio Zinno, George Cheng, and Michiel van de Panne. 2020. Character controllers using motion VAEs. 39, 4 (2020).Google Scholar
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579--2605.Google Scholar
- Julieta Martinez, Michael J. Black, and Javier Romero. 2017. On human motion prediction using recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Cross Ref
- Ian Mason, Sebastian Starke, He Zhang, Hakan Bilen, and Taku Komura. 2018. Few-shot learning of homogeneous human locomotion styles. Computer Graphics Forum 37, 7 (2018), 143--153.Google Scholar
Cross Ref
- Tomohiko Mukai and Shigeru Kuriyama. 2005. Geostatistical motion interpolation. In ACM Transactions on Graphics (TOG), Vol. 24. ACM, 1062--1070.Google Scholar
Digital Library
- Soomin Park, Deok-Kyeong Jang, and Sung-Hee Lee. 2021. Diverse motion stylization for multiple style domains via spatial-temporal graph-based generative model. Proceedings of the ACM on Computer Graphics and Interactive Techniques 4, 3 (2021), 1--17.Google Scholar
Digital Library
- Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018. DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37, 4 (July 2018). https://doi.org/10.1145/3197517.3201311Google Scholar
Digital Library
- Xue Bin Peng, Glen Berseth, KangKang Yin, and Michiel van de Panne. 2017. Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (TOG) 36, 4 (2017), 41.Google Scholar
Digital Library
- Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. 2018. Film: Visual reasoning with a general conditioning layer. In Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar
Cross Ref
- Sylvestre-Alvise Rebuffi, Hakan Bilen, and Andrea Vedaldi. 2017. Learning multiple visual domains with residual adapters. In Advances in Neural Information Processing Systems. 506--516.Google Scholar
- Sylvestre-Alvise Rebuffi, Hakan Bilen, and Andrea Vedaldi. 2018. Efficient parametrization of multi-domain deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Cross Ref
- Charles Rose, Michael F Cohen, and Bobby Bodenheimer. 1998. Verbs and adverbs: Multidimensional motion interpolation. IEEE Computer Graphics and Applications 18, 5 (1998), 32--40.Google Scholar
Digital Library
- Alla Safonova, Jessica K. Hodgins, and Nancy S. Pollard. 2004. Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces. ACM Trans. Graph. 23, 3 (Aug. 2004), 514--521. https://doi.org/10.1145/1015706.1015754Google Scholar
Digital Library
- Harrison Jesse Smith, Chen Cao, Michael Neff, and Yingying Wang. 2019. Efficient neural networks for real-time motion style transfer. Proc. ACM Comput. Graph. Interact. Tech. 2, 2 (July 2019). https://doi.org/10.1145/3340254Google Scholar
Digital Library
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929--1958.Google Scholar
Digital Library
- Sebastian Starke, He Zhang, Taku Komura, and Jun Saito. 2019. Neural state machine for character-scene interactions. ACM Trans. Graph. 38, 6 (Nov. 2019). https://doi.org/10.1145/3355089.3356505Google Scholar
Digital Library
- Sebastian Starke, Yiwei Zhao, Taku Komura, and Kazi Zaman. 2020. Local motion phases for learning multi-contact character movements. ACM Transactions on Graphics (TOG) 39, 4 (2020), 54--1.Google Scholar
Digital Library
- Amos J Storkey. 2009. When training and test sets are different: characterising learning transfer. In Dataset Shift in Machine Learning. MIT Press, 3--28.Google Scholar
- Graham W Taylor and Geoffrey E Hinton. 2009. Factored conditional restricted Boltzmann machines for modeling motion style. In Proceedings of the 26th annual international conference on machine learning. ACM, 1025--1032.Google Scholar
Digital Library
- Julius von Kügelgen, Yash Sharma, Luigi Gresele, Wieland Brendel, Bernhard Schölkopf, Michel Besserve, and Francesco Locatello. 2021. Self-supervised learning with data augmentations provably isolates content from style. In Advances in Neural Information Processing Systems.Google Scholar
- Jue Wang, Steven M. Drucker, Maneesh Agrawala, and Michael F. Cohen. 2006. The cartoon animation filter. ACM Trans. Graph. 25, 3 (July 2006). https://doi.org/10.1145/1141911.1142010Google Scholar
Digital Library
- Yu-Hui Wen, Zhipeng Yang, Hongbo Fu, Lin Gao, Yanan Sun, and Yong-Jin Liu. 2021. Autoregressive stylized motion synthesis with generative flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 13612--13621.Google Scholar
Cross Ref
- Shihong Xia, Congyi Wang, Jinxiang Chai, and Jessica Hodgins. 2015. Realtime style transfer for unlabeled heterogeneous human motion. ACM Trans. Graph. 34, 4 (2015), 119:1--119:10.Google Scholar
Digital Library
- KangKang Yin, Kevin Loken, and Michiel van de Panne. 2007. SIMBICON: Simple Biped Locomotion Control. ACM Trans. Graph. 26, 3 (July 2007). https://doi.org/10.1145/1276377.1276509Google Scholar
Digital Library
- Wenhao Yu, Greg Turk, and C. Karen Liu. 2018. Learning symmetric and low-energy locomotion. ACM Trans. Graph. 37, 4 (July 2018). https://doi.org/10.1145/3197517.3201397Google Scholar
Digital Library
- M Ersin Yumer and Niloy J Mitra. 2016. Spectral style transfer for human motion between independent actions. ACM Transactions on Graphics (TOG) 35, 4 (2016), 137.Google Scholar
- He Zhang, Sebastian Starke, Taku Komura, and Jun Saito. 2018. Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics (TOG) 37, 4 (2018), 145.Google Scholar
Digital Library
Index Terms
Real-Time Style Modelling of Human Locomotion via Feature-Wise Transformations and Local Motion Phases
Recommendations
Motion Puzzle: Arbitrary Motion Style Transfer by Body Part
This article presents Motion Puzzle, a novel motion style transfer network that advances the state-of-the-art in several important respects. The Motion Puzzle is the first that can control the motion style of individual body parts, allowing for local ...
Efficient Neural Networks for Real-time Motion Style Transfer
Style is an intrinsic, inescapable part of human motion. It complements the content of motion to convey meaning, mood, and personality. Existing state-of-the-art motion style methods require large quantities of example data and intensive computational ...
On-Line motion style transfer
ICEC'06: Proceedings of the 5th international conference on Entertainment ComputingMotion capture techniques play an important role in computer animation. Because the cost of motion capture data is relatively high and the virtual environment changes frequently in actual applications, researchers in this area focus their work on ...






Comments