Abstract
Physics-based character animation has seen significant advances in recent years with the adoption of Deep Reinforcement Learning (DRL). However, DRL-based learning methods are usually computationally expensive and their performance crucially depends on the choice of hyperparameters. Tuning hyperparameters for these methods often requires repetitive training of control policies, which is even more computationally prohibitive. In this work, we propose a novel Curriculum-based Multi-Fidelity Bayesian Optimization framework (CMFBO) for efficient hyperparameter optimization of DRL-based character control systems. Using curriculum-based task difficulty as fidelity criterion, our method improves searching efficiency by gradually pruning search space through evaluation on easier motor skill tasks. We evaluate our method on two physics-based character control tasks: character morphology optimization and hyperparameter tuning of DeepMimic. Our algorithm significantly outperforms state-of-the-art hyperparameter optimization methods applicable for physics-based character animation. In particular, we show that hyperparameters optimized through our algorithm result in at least 5x efficiency gain comparing to author-released settings in DeepMimic.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Efficient Hyperparameter Optimization for Physics-based Character Animation
- Shailen Agrawal, Shuo Shen, and Michiel van de Panne. 2014. Diverse motions and character shapes for simulated skills. IEEE transactions on visualization and computer graphics 20, 10 (2014), 1345--1355.Google Scholar
Cross Ref
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. ACM New York, NY, USA, 41--48.Google Scholar
Digital Library
- Kevin Bergamin, Simon Clavet, Daniel Holden, and James Richard Forbes. 2019. DReCon: data-driven responsive control of physics-based characters. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--11.Google Scholar
Digital Library
- Josh Bongard. 2011. Morphological change in machines accelerates the evolution of robust behavior. Proceedings of the National Academy of Sciences 108, 4 (2011), 1234--1239.Google Scholar
Cross Ref
- Eric Brochu, Tyson Brochu, and Nando de Freitas. 2010. A Bayesian interactive optimization approach to procedural animation design. In Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. 103--112.Google Scholar
Digital Library
- Eric Brochu, Abhijeet Ghosh, and Nando de Freitas. 2007. Preference galleries for material design. SIGGRAPH Posters 105, 10.1145 (2007), 1280720--1280834.Google Scholar
- Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016).Google Scholar
- Stelian Coros, Philippe Beaudoin, and Michiel Van de Panne. 2010. Generalized biped walking control. ACM Transactions on Graphics (TOG) 29, 4 (2010), 1--9.Google Scholar
Digital Library
- Stelian Coros, Andrej Karpathy, Ben Jones, Lionel Reveret, and Michiel Van De Panne. 2011. Locomotion skills for simulated quadrupeds. ACM Transactions on Graphics (TOG) 30, 4 (2011), 1--12.Google Scholar
Digital Library
- Erwin Coumans and Yunfei Bai. 2016--2019. PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org.Google Scholar
- Martin De Lasa, Igor Mordatch, and Aaron Hertzmann. 2010. Feature-based locomotion controllers. ACM Transactions on Graphics (TOG) 29, 4 (2010), 1--10.Google Scholar
Digital Library
- Martin L Felis and Katja Mombaur. 2016. Synthesis of full-body 3D human gait using optimal control methods. In 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1560--1566.Google Scholar
Digital Library
- Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018).Google Scholar
- Thomas Geijtenbeek, Michiel Van De Panne, and A Frank Van Der Stappen. 2013. Flexible muscle-based locomotion for bipedal creatures. ACM Transactions on Graphics (TOG) 32, 6 (2013), 1--11.Google Scholar
Digital Library
- GPy. since 2012. GPy: A Gaussian process framework in python. http://github.com/SheffieldML/GPy.Google Scholar
- David Ha. 2019. Reinforcement learning for improving agent design. Artificial life 25, 4 (2019), 352--365.Google Scholar
- Sehoon Ha, Stelian Coros, Alexander Alspach, Joohyung Kim, and Katsu Yamane. 2017. Joint Optimization of Robot Design and Motion Parameters using the Implicit Function Theorem.. In Robotics: Science and systems.Google Scholar
- Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018).Google Scholar
- Perttu Hämäläinen, Joose Rajamäki, and C Karen Liu. 2015. Online control of simulated humanoids using particle belief propagation. ACM Transactions on Graphics (TOG) 34, 4 (2015), 1--13.Google Scholar
Digital Library
- Nikolaus Hansen. 2006. The CMA evolution strategy: a comparing review. In Towards a new evolutionary computation. Springer, 75--102.Google Scholar
- Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, SM Eslami, et al. 2017. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017).Google Scholar
- Jessica K Hodgins, Wayne L Wooten, David C Brogan, and James F O'Brien. 1995. Animating human athletics. In Proceedings of the 22nd annual conference on Computer graphics and interactive techniques. 71--78.Google Scholar
Digital Library
- Sha Hu, Zeshi Yang, and Greg Mori. 2020. Neural fidelity warping for efficient robot morphology design. arXiv preprint arXiv:2012.04195 (2020).Google Scholar
- Wenlong Huang, Igor Mordatch, and Deepak Pathak. 2020. One policy to control them all: Shared modular policies for agent-agnostic control. In International Conference on Machine Learning. PMLR, 4455--4464.Google Scholar
- Sumit Jain, Yuting Ye, and C Karen Liu. 2009. Optimization-based interactive motion synthesis. ACM Transactions on Graphics (TOG) 28, 1 (2009), 1--12.Google Scholar
Digital Library
- Noémie Jaquier, Leonel Rozo, Sylvain Calinon, and Mathias Bürger. 2020. Bayesian optimization meets Riemannian manifolds in robot learning. In Conference on Robot Learning. PMLR, 233--246.Google Scholar
- Donald R Jones, Matthias Schonlau, and William J Welch. 1998. Efficient global optimization of expensive black-box functions. Journal of Global optimization 13, 4 (1998), 455--492.Google Scholar
Digital Library
- Kirthevasan Kandasamy, Gautam Dasarathy, Junier B Oliva, Jeff Schneider, and Barnabás Póczos. 2016. Gaussian process bandit optimisation with multi-fidelity evaluations. In Advances in Neural Information Processing Systems. 992--1000.Google Scholar
- Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, and Barnabás Póczos. 2017. Multi-fidelity bayesian optimisation with continuous approximations. Advances in Neural Information Processing Systems (2017), 1799--1808.Google Scholar
- Andrej Karpathy and Michiel Van De Panne. 2012. Curriculum learning for motor skills. In Canadian Conference on Artificial Intelligence. Springer, 325--330.Google Scholar
Digital Library
- Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. 2017. Fast bayesian optimization of machine learning hyperparameters on large datasets. In Artificial Intelligence and Statistics. PMLR, 528--536.Google Scholar
- Ilya Kostrikov. 2018. PyTorch Implementations of Reinforcement Learning Algorithms. https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail.Google Scholar
- Yuki Koyama, Issei Sato, and Masataka Goto. 2020. Sequential gallery for interactive visual design optimization. ACM Transactions on Graphics 39, 4 (Jul 2020). https://doi.org/10.1145/3386569.3392444Google Scholar
Digital Library
- Yuki Koyama, Issei Sato, Daisuke Sakamoto, and Takeo Igarashi. 2017. Sequential line search for efficient visual design optimization by crowds. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1--11.Google Scholar
Digital Library
- Thomas Liao, Grant Wang, Brian Yang, Rene Lee, Kristofer Pister, Sergey Levine, and Roberto Calandra. 2019. Data-efficient learning of morphology and controller for a microrobot. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2488--2494.Google Scholar
Digital Library
- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. International Conference on Learning Representations (2016).Google Scholar
- Hod Lipson and Jordan B Pollack. 2000. Automatic design and manufacture of robotic lifeforms. Nature 406, 6799 (2000), 974--978.Google Scholar
- Dong C Liu and Jorge Nocedal. 1989. On the limited memory BFGS method for large scale optimization. Mathematical programming 45, 1--3 (1989), 503--528.Google Scholar
- Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 6 (2015), 1--16.Google Scholar
Digital Library
- Kevin Sebastian Luck, Heni Ben Amor, and Roberto Calandra. 2020. Data-efficient Co-Adaptation of Morphology and Behaviour with Deep Reinforcement Learning. In Conference on Robot Learning. PMLR, 854--869.Google Scholar
- Ying-Sheng Luo, Jonathan Hans Soeseno, Trista Pei-Chun Chen, and Wei-Chao Chen. 2020. CARL: Controllable Agent with Reinforcement Learning for Quadruped Locomotion. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2020) 39, 4 (2020), 10 pages.Google Scholar
Digital Library
- Li-Ke Ma, Zeshi Yang, Tong Xin, Baining Guo, and KangKang Yin. 2021. Learning and Exploring Motor Skills with Spacetime Bounds. Computer Graphics Forum 40, 2(2021).Google Scholar
- Josh Merel, Saran Tunyasuvunakool, Arun Ahuja, Yuval Tassa, Leonard Hasenclever, Vu Pham, Tom Erez, Greg Wayne, and Nicolas Heess. 2020. Catch & Carry: reusable neural controllers for vision-guided whole-body tasks. ACM Transactions on Graphics (TOG) 39, 4 (2020), 39--1.Google Scholar
Digital Library
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. 1928--1937.Google Scholar
Digital Library
- Igor Mordatch, Emanuel Todorov, and Zoran Popović. 2012. Discovery of complex behaviors through contact-invariant optimization. ACM Transactions on Graphics (TOG) 31, 4 (2012), 1--8.Google Scholar
Digital Library
- Vu Nguyen and Michael A Osborne. 2019. Knowing the what but not the where in Bayesian optimization. arXiv preprint arXiv:1905.02685 (2019).Google Scholar
- Vu Nguyen, Sebastian Schulze, and Michael Osborne. 2020. Bayesian optimization for iterative learning. Advances in Neural Information Processing Systems 33 (2020).Google Scholar
- Jahng-Hyon Park and Haruhiko Asada. 1994. Concurrent design optimization of mechanical structure and control for high speed robots. (1994).Google Scholar
- Soohwan Park, Hoseok Ryu, Seyoung Lee, Sunmin Lee, and Jehee Lee. 2019. Learning predict-and-simulate policies from unorganized human motion data. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--11.Google Scholar
Digital Library
- Chandana Paul and Josh C Bongard. 2001. The road less travelled: Morphology in the optimization of biped robot locomotion. In Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), Vol. 1. IEEE, 226--232.Google Scholar
Cross Ref
- Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018a. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--14.Google Scholar
Digital Library
- Xue Bin Peng, Glen Berseth, and Michiel Van de Panne. 2015. Dynamic terrain traversal skills using reinforcement learning. ACM Transactions on Graphics (TOG) 34, 4 (2015), 1--11.Google Scholar
Digital Library
- Xue Bin Peng, Glen Berseth, and Michiel Van de Panne. 2016. Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1--12.Google Scholar
Digital Library
- Xue Bin Peng, Glen Berseth, KangKang Yin, and Michiel Van De Panne. 2017. Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1--13.Google Scholar
Digital Library
- Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. 2018b. SFV: Reinforcement Learning of Physical Skills from Videos. ACM Transactions on Graphics (TOG) 37, 6 (2018).Google Scholar
Digital Library
- Anton C Pil and H Haruhiko Asada. 1996. Integrated structure/control design of mechatronic systems using a recursive experimental optimization method. IEEE/ASME transactions on mechatronics 1, 3 (1996), 191--203.Google Scholar
Cross Ref
- Carl Edward Rasmussen. 2003. Gaussian processes in machine learning. In Summer School on Machine Learning. Springer, 63--71.Google Scholar
- Charles Schaff, David Yunis, Ayan Chakrabarti, and Matthew R Walter. 2019. Jointly learning to construct and control agents using deep reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 9798--9805.Google Scholar
Digital Library
- John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. 1889--1897.Google Scholar
Digital Library
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google Scholar
- Karl Sims. 1994. Evolving virtual creatures. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques. 15--22.Google Scholar
Digital Library
- Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems. 2951--2959.Google Scholar
- Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat, and Ryan Adams. 2015. Scalable bayesian optimization using deep neural networks. In International conference on machine learning. 2171--2180.Google Scholar
- Jialin Song, Yuxin Chen, and Yisong Yue. 2019. A general framework for multi-fidelity bayesian optimization with gaussian processes. In The 22nd International Conference on Artificial Intelligence and Statistics. 3158--3167.Google Scholar
- Andrew Spielberg, Brandon Araki, Cynthia Sung, Russ Tedrake, and Daniela Rus. 2017. Functional co-optimization of articulated robots. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5035--5042.Google Scholar
Digital Library
- Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. 2010. Gaussian process optimization in the bandit setting: No regret and experimental design. Proceedings of the 27th annual international conference on machine learning.Google Scholar
- Kevin Swersky, Jasper Snoek, and Ryan P Adams. 2013. Multi-task bayesian optimization. In Advances in neural information processing systems. 2004--2012.Google Scholar
- Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. 2014. Freeze-thaw Bayesian optimization. arXiv preprint arXiv:1406.3896 (2014).Google Scholar
- Shion Takeno, Hitoshi Fukuoka, Yuhki Tsukada, Toshiyuki Koyama, Motoki Shiga, Ichiro Takeuchi, and Masayuki Karasuyama. 2019. Multi-fidelity Bayesian optimization with max-value entropy search. arXiv preprint arXiv:1901.08275 (2019).Google Scholar
- Jie Tan, Karen Liu, and Greg Turk. 2011. Stable proportional-derivative controllers. IEEE Computer Graphics and Applications 31, 4 (2011), 34--44.Google Scholar
Digital Library
- Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. 2018. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots. In Proceedings of Robotics: Science and Systems. Pittsburgh, Pennsylvania. https://doi.org/10.15607/RSS.2018.XIV.010Google Scholar
Cross Ref
- Yuval Tassa, Tom Erez, and Emanuel Todorov. 2012. Synthesis and stabilization of complex behaviors through online trajectory optimization. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4906--4913.Google Scholar
Cross Ref
- Ethan Tseng, Felix Yu, Yuting Yang, Fahim Mannan, Karl ST Arnaud, Derek Nowrouzezahrai, Jean-François Lalonde, and Felix Heide. 2019. Hyperparameter optimization in black-box image processing using differentiable proxies. ACM Trans. Graph. 38, 4 (2019), 27--1.Google Scholar
Digital Library
- Michiel Van de Panne and Alexis Lamouret. 1995. Guided optimization for balanced locomotion. In Computer Animation and Simulation'95. Springer, 165--177.Google Scholar
- Miguel G Villarreal-Cervantes, Carlos A Cruz-Villar, Jaime Alvarez-Gallegos, and Edgar A Portilla-Flores. 2012. Robust structure-control design approach for mechatronic systems. IEEE/ASME Transactions on Mechatronics 18, 5 (2012), 1592--1601.Google Scholar
Cross Ref
- Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 3 (2020), 261--272.Google Scholar
- Kevin Wampler, Zoran Popović, and Jovan Popovic. 2014. Generalizing locomotion style to new animals with inverse optimal regression. ACM Transactions on Graphics (TOG) 33, 4 (2014), 1--11.Google Scholar
Digital Library
- Jack M Wang, David J Fleet, and Aaron Hertzmann. 2009. Optimizing walking controllers. In ACM SIGGRAPH Asia 2009 papers. 1--8.Google Scholar
- Jack M Wang, Samuel R Hamner, Scott L Delp, and Vladlen Koltun. 2012. Optimizing locomotion controllers using biologically-based actuators and objectives. ACM Transactions on Graphics (TOG) 31, 4 (2012), 1--11.Google Scholar
Digital Library
- Tingwu Wang, Renjie Liao, Jimmy Ba, and Sanja Fidler. 2018. Nervenet: Learning structured policy with graph neural networks. In International Conference on Learning Representations.Google Scholar
- Ziyu Wang, Masrour Zoghi, Frank Hutter, David Matheson, Nando De Freitas, et al. 2013. Bayesian Optimization in High Dimensions via Random Embeddings.. In IJCAI. 1778--1784.Google Scholar
- Jungdam Won, Deepak Gopinath, and Jessica Hodgins. 2020. A scalable approach to control diverse behaviors for physically simulated characters. ACM Transactions on Graphics (TOG) 39, 4 (2020), 33--1.Google Scholar
Digital Library
- Jungdam Won and Jehee Lee. 2019. Learning body shape variation in physics-based characters. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--12.Google Scholar
Digital Library
- Jia-chi Wu and Zoran Popović. 2010. Terrain-adaptive bipedal locomotion control. ACM Transactions on Graphics (TOG) 29, 4(2010), 1--10.Google Scholar
- Zhaoming Xie, Hung Yu Ling, Nam Hee Kim, and Michiel van de Panne. 2020. ALLSTEPS: Curriculum-driven Learning of Stepping Stone Skills. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation.Google Scholar
- KangKang Yin, Stelian Coros, Philippe Beaudoin, and Michiel Van de Panne. 2008. Continuation methods for adapting simulated skills. In ACM SIGGRAPH 2008 papers. 1--7.Google Scholar
Digital Library
- KangKang Yin, Kevin Loken, and Michiel Van de Panne. 2007. Simbicon: Simple biped locomotion control. ACM Transactions on Graphics (TOG) 26, 3 (2007), 105-es.Google Scholar
Digital Library
- Wenhao Yu, Visak CV Kumar, Greg Turk, and C Karen Liu. 2019. Sim-to-real transfer for biped locomotion. (2019).Google Scholar
- Wenhao Yu, Greg Turk, and C Karen Liu. 2018. Learning symmetric and low-energy locomotion. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--12.Google Scholar
Digital Library
Index Terms
Efficient Hyperparameter Optimization for Physics-based Character Animation
Recommendations
DeepMimic: example-guided deep reinforcement learning of physics-based character skills
A longstanding goal in character animation is to combine data-driven specification of behavior with a system that can execute a similar behavior in a physical simulation, thus enabling realistic responses to perturbations and environmental variation. We ...
Physics-based character animation with cascadeur
SIGGRAPH '19: ACM SIGGRAPH 2019 StudioIn this workshop we will create a realistic acrobatic 3D fighting animation using the animation software Cascadeur. We will learn the key features of physics-based character animation and will immediately apply the learned knowledge by creating an ...
Physics-Based Character Animation for AR Applications
ISUVR '10: Proceedings of the 2010 International Symposium on Ubiquitous Virtual RealityPhysics-based approach for character animation has advantages in interactive AR applications in that the approach can create realistic character motions that are responsive to non-predefined environment and user inputs. In this paper, we introduce two ...






Comments