ABSTRACT
Any sufficiently complex system acts as a black box when it becomes easier to experiment with than to understand. Hence, black-box optimization has become increasingly important as systems have become more complex. In this paper we describe Google Vizier, a Google-internal service for performing black-box optimization that has become the de facto parameter tuning engine at Google. Google Vizier is used to optimize many of our machine learning models and other systems, and also provides core capabilities to Google's Cloud Machine Learning HyperTune subsystem. We discuss our requirements, infrastructure design, underlying algorithms, and advanced features such as transfer learning and automated early stopping that the service provides.
Supplemental Material
- Rémi Bardenet, Mátyás Brendel, Balázs Kégl, and Michele Sebag. 2013. Collaborative hyperparameter tuning. ICML Vol. 2 (2013), 199.Google Scholar
- James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems. 2546--2554.Google Scholar
- J Bernardo, MJ Bayarri, JO Berger, AP Dawid, D Heckerman, AFM Smith, and M West. 2011. Optimization under unknown constraints. Bayesian Statistics 9 Vol. 9 (2011), 229.Google Scholar
- Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D 3 data-driven documents. IEEE transactions on visualization and computer graphics, Vol. 17, 12 (2011), 2301--2309. Google Scholar
Digital Library
- Herman Chernoff. 1959. Sequential Design of Experiments. Ann. Math. Statist., Vol. 30, 3 (09 1959), 755--770. https://doi.org/10.1214/aoms/1177706205Google Scholar
Cross Ref
- Jasmine Collins, Jascha Sohl-Dickstein, and David Sussillo. 2017. Capacity and Trainability in Recurrent Neural Networks Profeedings of the International Conference on Learning Representations (ICLR).Google Scholar
- Andrew R Conn, Katya Scheinberg, and Luis N Vicente. 2009. Introduction to derivative-free optimization. SIAM.Google Scholar
- Thomas Desautels, Andreas Krause, and Joel W Burdick. 2014. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. Journal of Machine Learning Research Vol. 15, 1 (2014), 3873--3923.Google Scholar
Digital Library
- Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. 2015. Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves.. In IJCAI. 3460--3468.Google Scholar
Digital Library
- Steffen Finck, Nikolaus Hansen, Raymond Rost, and Anne Auger. 2009. Real-Parameter Black-Box Optimization Benchmarking 2009: Presentation of the Noiseless Functions. http://coco.gforge.inria.fr/lib/exe/fetch.php?media=download3.6:bbobdocfunctions.pdf. (2009).[Online].Google Scholar
- Jacob R Gardner, Matt J Kusner, Zhixiang Eddie Xu, Kilian Q Weinberger, and John P Cunningham. 2014. Bayesian Optimization with Inequality Constraints. ICML. 937--945.Google Scholar
Digital Library
- Michael A Gelbart, Jasper Snoek, and Ryan P Adams. 2014. Bayesian optimization with unknown constraints. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 250--259.Google Scholar
Digital Library
- Josep Ginebra and Murray K. Clayton 1995. Response Surface Bandits. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 57, 4 (1995), 771--784.Google Scholar
Cross Ref
- Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando de Freitas 2016. Taking the human out of the loop: A review of bayesian optimization. Proc. IEEE Vol. 104, 1 (2016), 148--175. Google Scholar
Cross Ref
- Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms Advances in neural information processing systems. 2951--2959.Google Scholar
- Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat, and Ryan P. Adams 2015. Scalable Bayesian Optimization Using Deep Neural Networks Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015 (JMLR Workshop and Conference Proceedings), Francis R. Bach and David M. Blei (Eds.), Vol. Vol. 37. JMLR.org, 2171--2180. http://jmlr.org/proceedings/papers/v37/snoek15.htmlGoogle Scholar
- Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, and Frank Hutter 2016. Bayesian Optimization with Robust Bayesian Neural Networks. Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 4134--4142.Google Scholar
- Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger 2010. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. ICML (2010).Google Scholar
- Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. 2014. Freeze-thaw Bayesian optimization. arXiv preprint arXiv:1406.3896 (2014).Google Scholar
- Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing 2016. Deep kernel learning Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. 370--378.Google Scholar
- Dani Yogatama and Gideon Mann 2014. Efficient Transfer Learning Method for Automatic Hyperparameter Tuning. JMLR: W&CP Vol. 33 (2014), 1077--1085.Google Scholar
- Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).Google Scholar
Index Terms
Google Vizier: A Service for Black-Box Optimization
Recommendations
Comparison of cauchy EDA and BIPOP-CMA-ES algorithms on the BBOB noiseless testbed
GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computationEstimation-of-distribution algorithm using Cauchy sampling distribution is compared with the bi-population CMA evolutionary strategy which was one of the best contenders in the black-box optimization benchmarking workshop in 2009. The results clearly ...
Opposition-based artificial bee colony algorithm
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computationThe Artificial Bee Colony (ABC) algorithm is a relatively new algorithm for function optimization. The algorithm is inspired by the foraging behavior of honey bees. In this work, the performance of ABC is enhanced by introducing the concept of ...
Toward self-learning model-based EAs
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference CompanionModel-based evolutionary algorithms (MBEAs) are praised for their broad applicability to black-box optimization problems. In practical applications however, they are mostly used to repeatedly optimize different instances of a single problem class, a ...





Comments