ABSTRACT
We propose a new probabilistic programming language for the design and analysis of perception systems, especially those based on machine learning. Specifically, we consider the problems of training a perception system to handle rare events, testing its performance under different conditions, and debugging failures. We show how a probabilistic programming language can help address these problems by specifying distributions encoding interesting types of inputs and sampling these to generate specialized training and test sets. More generally, such languages can be used for cyber-physical systems and robotics to write environment models, an essential prerequisite to any formal analysis. In this paper, we focus on systems like autonomous cars and robots, whose environment is a scene, a configuration of physical objects and agents. We design a domain-specific language, Scenic, for describing scenarios that are distributions over scenes. As a probabilistic programming language, Scenic allows assigning distributions to features of the scene, as well as declaratively imposing hard and soft constraints over the scene. We develop specialized techniques for sampling from the resulting distribution, taking advantage of the structure provided by Scenic's domain-specific syntax. Finally, we apply Scenic in a case study on a convolutional neural network designed to detect cars in road images, improving its performance beyond that achieved by state-of-the-art synthetic data generation methods.
Supplemental Material
- Dario Amodei, Chris Olah, Jacob Steinhardt, Paul F. Christiano, John Schulman, and Dan Mané. 2016. Concrete Problems in AI Safety. CoRR abs/1606.06565 (2016). arXiv:1606.06565.Google Scholar
- Sylvain Arlot and Alain Celisse. 2010. A survey of cross-validation procedures for model selection. Statist. Surv. 4 (2010), 40-79.Google Scholar
Cross Ref
- Manfred Broy, Bengt Jonsson, Joost-Pieter Katoen, Martin Leucker, and Alexander Pretschner. 2005. Model-Based Testing of Reactive Systems: Advanced Lectures (Lecture Notes in Computer Science). Springer-Verlag New York, Inc., Secaucus, NJ, USA.Google Scholar
- Guillaume Claret, Sriram K Rajamani, Aditya V Nori, Andrew D Gordon, and Johannes Borgström. 2013. Bayesian inference using data flow analysis. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ACM, 92-102. Google Scholar
Digital Library
- Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An Open Urban Driving Simulator. In Conference on Robot Learning, CoRL. 1-16.Google Scholar
- Tommaso Dreossi, Daniel J. Fremont, Shromona Ghosh, Edward Kim, Hadi Ravanbakhsh, Marcell Vazquez-Chanlatte, and Sanjit A. Seshia. 2019. VerifAI: A Toolkit for the Design and Analysis of Artificial Intelligence-Based Systems. arXiv:1902.04245 https://github.com/BerkeleyLearnVerify/VerifAI.Google Scholar
- DuPont. 2012. Global Automotive Color Popularity Report. https://web.archive.org/web/20130818022236/ http://www2.dupont.com/Media_Center/en_US/color_popularity/Images_2012/DuPont2012ColorPopularity.pdf.Google Scholar
- Tayfun Elmas, Jacob Burnim, George Necula, and Koushik Sen. 2013. CONCURRIT: a domain specific language for reproducing concurrency bugs. In ACM SIGPLAN Notices, Vol. 48. ACM, 153-164.Google Scholar
Digital Library
- Artur Filipowicz, Jeremiah Liu, and Alain Kornhauser. 2017. Learning to recognize distance to stop signs using the virtual world of Grand Theft Auto 5. Technical Report. Princeton University.Google Scholar
- Matthew Fisher, Daniel Ritchie, Manolis Savva, Thomas Funkhouser, and Pat Hanrahan. 2012. Example-based Synthesis of 3D Object Arrangements. In ACM SIGGRAPH 2012 (SIGGRAPH Asia '12). Google Scholar
Digital Library
- Daniel Fremont, Xiangyu Yue, Tommaso Dreossi, Shromona Ghosh, Alberto L. Sangiovanni-Vincentelli, and Sanjit A. Seshia. 2018. Scenic: Language-Based Scene Generation. Technical Report UCB/EECS-2018-8. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-8.html.Google Scholar
- Daniel J. Fremont, Alexandre Donzé, Sanjit A. Seshia, and David Wessel. 2015. Control Improvisation. In 35th IARCS Annual Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS) (LIPIcs), Vol. 45. 463-474.Google Scholar
- Daniel J. Fremont, Tommaso Dreossi, Shromona Ghosh, Xiangyu Yue, Alberto L. Sangiovanni-Vincentelli, and Sanjit A. Seshia. 2019. Scenic: A Language for Scenario Specification and Scene Generation. arXiv:1809.09310 https://github.com/BerkeleyLearnVerify/Scenic.Google Scholar
- Rockstar Games. 2015. Grand Theft Auto V. Windows PC version. https://www.rockstargames.com/games/info/V.Google Scholar
- Adele Goldberg and David Robson. 1983. Smalltalk-80: The Language and its Implementation. Addison-Wesley, Reading, Massachusetts. Google Scholar
Digital Library
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672-2680. Google Scholar
Digital Library
- Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and Harnessing Adversarial Examples. CoRR abs/1412.6572 (2014). arXiv:1412.6572.Google Scholar
- Noah Goodman, Vikash K. Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua B. Tenenbaum. 2008. Church: A universal language for generative models. In Uncertainty in Artificial Intelligence 24 (UAI). 220-229. Google Scholar
Digital Library
- Noah D Goodman and Andreas Stuhlmüller. 2014. The Design and Implementation of Probabilistic Programming Languages. http://dippl.org. Accessed: 2018-7-11.Google Scholar
- Andrew D Gordon, Thomas A Henzinger, Aditya V Nori, and Sriram K Rajamani. 2014. Probabilistic programming. In FOSE 2014. ACM, 167-181. Google Scholar
Digital Library
- Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. 2016. Synthetic Data for Text Localisation in Natural Images. In Computer Vision and Pattern Recognition, CVPR. 2315-2324.Google Scholar
- Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. CoRR abs/1406.2227 (2014). arXiv:1406.2227.Google Scholar
- Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, and Song-Chun Zhu. 2018. Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars. International Journal of Computer Vision (2018), 1-22. Google Scholar
Digital Library
- Matthew Johnson-Roberson, Charles Barto, Rounak Mehta, Sharath Nittur Sridhar, Karl Rosaen, and Ram Vasudevan. 2017. Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks?. In International Conference on Robotics and Automation, ICRA. 746-753.Google Scholar
Digital Library
- Alexander Jung. 2018. imgaug. https://github.com/aleju/imgaug.Google Scholar
- Tejas Kulkarni, Pushmeet Kohli, Joshua B. Tenenbaum, and Vikash K. Mansinghka. 2015. Picture: A probabilistic programming language for scene perception. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4390-4399.Google Scholar
- Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, and Eric P Xing. 2017. Recurrent Topic-Transition GAN for Visual Paragraph Generation. arXiv preprint arXiv:1703.07022 (2017).Google Scholar
- Joerg Liebelt and Cordelia Schmid. 2010. Multi-view object class detection with a 3D geometric model. In Computer Vision and Pattern Recognition, CVPR. 1688-1695.Google Scholar
- Marco Marchesi. 2017. Megapixel Size Image Creation using Generative Adversarial Networks. arXiv preprint arXiv:1706.00082 (2017).Google Scholar
- Olivier Michel. 2004. Webots: Professional Mobile Robot Simulation. International Journal of Advanced Robotic Systems 1, 1 (2004), 39-42.Google Scholar
- Brian Milch, Bhaskara Marthi, and Stuart Russell. 2004. BLOG: Relational modeling with unknown objects. In ICML 2004 workshop on statistical relational learning and its connections to other fields. 67-73.Google Scholar
- Pascal Müller, Peter Wonka, Simon Haegler, Andreas Ulmer, and Luc Van Gool. 2006. Procedural modeling of buildings. In ACM Transactions On Graphics, Vol. 25. ACM, 614-623. Google Scholar
Digital Library
- Yehuda Naveh, Michal Rimon, Itai Jaeger, Yoav Katz, Michael Vinov, Eitan Marcus, and Gil Shurek. 2006. Constraint-Based Random Stimuli Generation for Hardware Verification. In Proc. of AAAI. 1720-1727. Google Scholar
Digital Library
- Aditya V Nori, Chung-Kil Hur, Sriram K Rajamani, and Selva Samuel. 2014. R2: An Efficient MCMC Sampler for Probabilistic Programs. In AAAI. 2476-2482. Google Scholar
Digital Library
- Laminar Research. 2019. X-Plane 11. https://www.x-plane.com/.Google Scholar
- Daniel Ritchie. 2014. Quicksand: A Lightweight Embedding of Probabilistic Programming for Procedural Modeling and Design. In 3rd NIPS Workshop on Probabilistic Programming. https://dritchie.github.io/pdf/qs.pdf.Google Scholar
- Daniel Ritchie. 2016. Probabilistic programming for procedural modeling and design. Ph.D. Dissertation. Stanford University. https://purl.stanford.edu/vh730bw6700.Google Scholar
- Germán Ros, Laura Sellart, Joanna Materzynska, David Vázquez, and Antonio M. López. 2016. The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. In Computer Vision and Pattern Recognition, CVPR. 3234-3243.Google Scholar
- Stuart Russell, Tom Dietterich, Eric Horvitz, Bart Selman, Francesca Rossi, Demis Hassabis, Shane Legg, Mustafa Suleyman, Dileep George, and Scott Phoenix. 2015. Letter to the Editor: Research Priorities for Robust and Beneficial Artificial Intelligence: An Open Letter. AI Magazine 36, 4 (2015).Google Scholar
- Nasser Saheb-Djahromi. 1978. Probabilistic LCF. In Mathematical Foundations of Computer Science. Springer, 442-451.Google Scholar
- Sanjit A. Seshia, Dorsa Sadigh, and S. Shankar Sastry. 2016. Towards Verified Artificial Intelligence. arXiv:1606.08514.Google Scholar
- Michael Stark, Michael Goesele, and Bernt Schiele. 2010. Back to the Future: Learning Shape Models from 3D CAD Data. In British Machine Vision Conference, BMVC. 1-11.Google Scholar
Cross Ref
- Michael Sutton, Adam Greene, and Pedram Amini. 2007. Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley. Google Scholar
Digital Library
- Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In International Conference on Intelligent Robots and Systems, IROS. 23-30.Google Scholar
Digital Library
- David Vazquez, Antonio M Lopez, Javier Marin, Daniel Ponsa, and David Geronimo. 2014. Virtual and realworld adaptation for pedestrian detection. IEEE transactions on pattern analysis and machine intelligence 36, 4 (2014), 797-809. Google Scholar
Digital Library
- Sebastien C Wong, Adam Gatt, Victor Stamatescu, and Mark D McDonnell. 2016. Understanding data augmentation for classification: when to warp?. In Digital Image Computing: Techniques and Applications (DICTA), 2016 International Conference on. IEEE, 1-6.Google Scholar
Cross Ref
- Frank Wood, Jan Willem Meent, and Vikash Mansinghka. 2014. A new approach to probabilistic programming inference. In Artificial Intelligence and Statistics. 1024-1032.Google Scholar
- Bichen Wu, Forrest N. Iandola, Peter H. Jin, and Kurt Keutzer. 2017. SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving. In Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops. 446-454.Google Scholar
- Yan Xu, Ran Jia, Lili Mou, Ge Li, Yunchuan Chen, Yangyang Lu, and Zhi Jin. 2016. Improved relation classification by deep recurrent neural networks with data augmentation. arXiv preprint arXiv:1601.03651 (2016).Google Scholar
Index Terms
Scenic: a language for scenario specification and scene generation
Recommendations
Scenic: a language for scenario specification and data generation
AbstractWe propose a new probabilistic programming language for the design and analysis of cyber-physical systems, especially those based on machine learning. We consider several problems arising in the design process, including training a system to be ...
3D Environment Modeling for Falsification and Beyond with Scenic 3.0
Computer Aided VerificationAbstractWe present a major new version of Scenic, a probabilistic programming language for writing formal models of the environments of cyber-physical systems. Scenic has been successfully used for the design and analysis of CPS in a variety of domains, ...
What are the Odds?: probabilistic programming in Scala
SCALA '13: Proceedings of the 4th Workshop on ScalaProbabilistic programming is a powerful high-level paradigm for probabilistic modeling and inference. We present Odds, a small domain-specific language (DSL) for probabilistic programming, embedded in Scala. Odds provides first-class support for random ...





Comments