Abstract
In this paper, a reinforcement learning approach for designing feedback neural network controllers for nonlinear systems is proposed. Given a Signal Temporal Logic (STL) specification which needs to be satisfied by the system over a set of initial conditions, the neural network parameters are tuned in order to maximize the satisfaction of the STL formula. The framework is based on a max-min formulation of the robustness of the STL formula. The maximization is solved through a Lagrange multipliers method, while the minimization corresponds to a falsification problem. We present our results on a vehicle and a quadrotor model and demonstrate that our approach reduces the training time more than 50 percent compared to the baseline approach.
- Houssam Abbas, Matthew O’Kelly, Alena Rodionova, and Rahul Mangharam. 2017. Safe at any speed: A simulation-based test harness for autonomous vehicles. (2017).Google Scholar
- Arvind Adimoolam, Thao Dang, Alexandre Donzé, James Kapinski, and Xiaoqing Jin. 2017. Classification and coverage-based falsification for embedded control systems. In International Conference on Computer Aided Verification. Springer, 483--503.Google Scholar
Cross Ref
- Matthias Althoff. 2015. An introduction to CORA 2015. In Proc. of the Workshop on Applied Verification for Continuous and Hybrid Systems.Google Scholar
- Yashwanth Annpureddy, Che Liu, Georgios Fainekos, and Sriram Sankaranarayanan. 2011. S-taliro: A tool for temporal logic falsification for hybrid systems. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 254--257.Google Scholar
Digital Library
- Ezio Bartocci, Jyotirmoy Deshmukh, Alexandre Donze, Georgios Fainekos, Oded Maler, Dejan Nivckovic, and Sriram Sankaranarayanan. 2018. Specification-based monitoring of cyber-physical systems: A survey on theory, tools and applications. In Lectures on Runtime Verification. Springer, 135--175.Google Scholar
- Dimitri P. Bertsekas. 2014. Constrained Optimization and Lagrange Multiplier Methods. Academic press.Google Scholar
- Dimitri P. Bertsekas. 2019. Reinforcement learning and optimal control. Athena Scientific.Google Scholar
- Xin Chen, Erika Ábrahám, and Sriram Sankaranarayanan. 2013. Flow*: An analyzer for non-linear hybrid systems. In International Conference on Computer Aided Verification. Springer, 258--263.Google Scholar
Cross Ref
- Kyunghoon Cho and Songhwai Oh. 2018. Learning-based model predictive control under signal temporal logic specifications. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 7322--7329.Google Scholar
Digital Library
- Arthur Claviere, Souradeep Dutta, and Sriram Sankaranarayanan. 2019. Trajectory tracking control for robotic vehicles using counterexample guided training of neural networks. In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 29. 680--688.Google Scholar
- Konstantinos Dalamagkidis, Kimon P Valavanis, and Les A. Piegl. 2010. Nonlinear model predictive control with neural network optimization for autonomous autorotation of small unmanned helicopters. IEEE Transactions on Control Systems Technology 19, 4 (2010), 818--831.Google Scholar
Cross Ref
- M. Dehghani, M. Ahmadi, A. Khayatian, M. Eghtesad, and M. Farid. 2008. Neural network solution for forward kinematics problem of HEXA parallel robot. In 2008 American Control Conference. IEEE, 4214--4219.Google Scholar
- Marc Deisenroth and Carl E. Rasmussen. 2011. PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 465--472.Google Scholar
- Alexandre Donzé. 2010. Breach, a toolbox for verification and parameter synthesis of hybrid systems. In International Conference on Computer Aided Verification. Springer, 167--170.Google Scholar
Digital Library
- Alexandre Donzé and Oded Maler. 2010. Robust satisfaction of temporal logic over real-valued signals. In International Conference on Formal Modeling and Analysis of Timed Systems. Springer, 92--106.Google Scholar
Digital Library
- Tommaso Dreossi, Alexandre Donzé, and Sanjit A Seshia. 2017. Compositional falsification of cyber-physical systems with machine learning components. In NASA Formal Methods Symposium. Springer, 357--372.Google Scholar
Cross Ref
- Tommaso Dreossi, Shromona Ghosh, Xiangyu Yue, Kurt Keutzer, Alberto Sangiovanni-Vincentelli, and Sanjit A. Seshia. 2018. Counterexample-guided data augmentation. arXiv preprint arXiv:1805.06962 (2018).Google Scholar
- Tommaso Dreossi, Somesh Jha, and Sanjit A. Seshia. 2018. Semantic adversarial deep learning. In International Conference on Computer Aided Verification. Springer, 3--26.Google Scholar
- Souradeep Dutta, Xin Chen, and Sriram Sankaranarayanan. 2019. Reachability analysis for neural feedback systems using regressive polynomial rule inference. In International Conference on Hybrid Systems: Computation and Control (HSCC).Google Scholar
Digital Library
- Souradeep Dutta, Susmit Jha, Sriram Sanakaranarayanan, and Ashish Tiwari. 2017. Output range analysis for deep neural networks. arXiv preprint arXiv:1709.09130 (2017).Google Scholar
- Souradeep Dutta, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. 2018. Learning and verification of feedback control systems using feedforward neural networks. IFAC-PapersOnLine 51, 16 (2018), 151--156.Google Scholar
Cross Ref
- Georgios E. Fainekos and George J. Pappas. 2009. Robustness of temporal logic specifications for continuous-time signals. Theoretical Computer Science 410, 42 (2009), 4262--4291.Google Scholar
Digital Library
- Goran Frehse, Colas Le Guernic, Alexandre Donzé, Scott Cotton, Rajarshi Ray, Olivier Lebeltel, Rodolfo Ripado, Antoine Girard, Thao Dang, and Oded Maler. 2011. SpaceEx: Scalable verification of hybrid systems. In International Conference on Computer Aided Verification. Springer, 379--395.Google Scholar
Digital Library
- Qitong Gao, Davood Hajinezhad, Yan Zhang, Yiannis Kantaros, and Michael M. Zavlanos. 2019. Reduced variance deep reinforcement learning with temporal logic specifications. (2019).Google Scholar
- Martin T Hagan, Howard B Demuth, and Orlando De Jesús. 2002. An introduction to the use of neural networks in control systems. International Journal of Robust and Nonlinear Control: IFAC-Affiliated Journal 12, 11 (2002), 959--985.Google Scholar
Cross Ref
- Nikolaus Hansen and Stefan Kern. 2004. Evaluating the CMA evolution strategy on multimodal test functions. In International Conference on Parallel Problem Solving from Nature. Springer, 282--291.Google Scholar
Cross Ref
- Nikolaus Hansen and Andreas Ostermeier. 2001. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9, 2 (2001), 159--195.Google Scholar
Digital Library
- Michael Hertneck, Johannes Köhler, Sebastian Trimpe, and Frank Allgöwer. 2018. Learning an approximate model predictive controller with guarantees. IEEE Control Systems Letters 2, 3 (2018), 543--548.Google Scholar
Cross Ref
- Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359--366.Google Scholar
Cross Ref
- Radoslav Ivanov, James Weimer, Rajeev Alur, George J. Pappas, and Insup Lee. 2019. Verisig: Verifying safety properties of hybrid systems with neural network controllers. (2019), 169--178.Google Scholar
Digital Library
- Kyle D. Julian and Mykel J. Kochenderfer. 2017. Neural network guidance for UAVs. In AIAA Guidance, Navigation, and Control Conference. 1743.Google Scholar
- Kyle D. Julian, Jessica Lopez, Jeffrey S. Brush, Michael P. Owen, and Mykel J. Kochenderfer. 2016. Policy compression for aircraft collision avoidance systems. In 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC). IEEE, 1--10.Google Scholar
- Hassan K. Khalil and Jessy W. Grizzle. 2002. Nonlinear systems. Vol. 3. Prentice hall Upper Saddle River, NJ.Google Scholar
- Ron Koymans. 1990. Specifying real-time properties with metric temporal logic. Real-time Systems 2, 4 (1990), 255--299.Google Scholar
Digital Library
- Sergey Levine and Pieter Abbeel. 2014. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems. 1071--1079.Google Scholar
- Xiao Li, Yao Ma, and Calin Belta. 2018. A policy search method for temporal logic specified reinforcement learning tasks. In 2018 Annual American Control Conference (ACC). IEEE, 240--245.Google Scholar
Cross Ref
- Oded Maler and Dejan Nickovic. 2004. Monitoring temporal properties of continuous signals. In Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems. Springer, 152--166.Google Scholar
- Mohammadreza Mehrabian et al. 2017. Timestamp temporal logic (TTL) for testing the timing of cyber-physical systems. ACM Transactions on Embedded Computing Systems (TECS) 16, 5s (2017), 169.Google Scholar
- William H Montgomery and Sergey Levine. 2016. Guided policy search via approximate mirror descent. In Advances in Neural Information Processing Systems. 4008--4016.Google Scholar
- Meinard Müller. 2007. Dynamic time warping. Information Retrieval for Music and Motion (2007), 69--84.Google Scholar
Digital Library
- K. Muralitharan, Rathinasamy Sakthivel, and R. Vishnuvarthan. 2018. Neural network based optimization approach for energy demand prediction in smart grid. Neurocomputing 273 (2018), 199--208.Google Scholar
Digital Library
- Yash Vardhan Pant, Houssam Abbas, and Rahul Mangharam. 2017. Smooth operator: Control using the smooth robustness of temporal logic. In Control Technology and Applications (CCTA), 2017 IEEE Conference on. IEEE, 1235--1240.Google Scholar
Cross Ref
- Yash Vardhan Pant, Houssam Abbas, Rhudii A. Quaye, and Rahul Mangharam. 2018. Fly-by-logic: Control of multi-drone fleets with temporal logic objectives. In Proceedings of the 9th ACM/IEEE International Conference on Cyber-Physical Systems. IEEE Press, 186--197.Google Scholar
Digital Library
- Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2012. Understanding the exploding gradient problem. CoRR, abs/1211.5063 2 (2012).Google Scholar
- Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In International Conference on Machine Learning. 1310--1318.Google Scholar
Digital Library
- Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. ACM, 1--18.Google Scholar
Digital Library
- Vasumathi Raman, Alexandre Donzé, Mehdi Maasoumy, Richard M. Murray, Alberto Sangiovanni-Vincentelli, and Sanjit A. Seshia. 2014. Model predictive control with signal temporal logic specifications. In 53rd IEEE Conference on Decision and Control. IEEE, 81--87.Google Scholar
- Vasumathi Raman, Alexandre Donzé, Dorsa Sadigh, Richard M. Murray, and Sanjit A. Seshia. 2015. Reactive synthesis from signal temporal logic specifications. In Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control. ACM, 239--248.Google Scholar
- Vicenc Rubies Royo, David Fridovich-Keil, Sylvia Herbert, and Claire J. Tomlin. 2018. Classification-based approximate reachability with guarantees applied to safe trajectory tracking. arXiv preprint arXiv:1803.03237 (2018).Google Scholar
- Johann Schumann and Yan Liu. 2010. Applications of neural networks in high assurance systems. SCI, Vol. 268. Springer.Google Scholar
Digital Library
- Cumhur Erkan Tuncali, Georgios Fainekos, Hisahiro Ito, and James Kapinski. 2018. Simulation-based adversarial test generation for autonomous vehicles with machine learning components. In 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1555--1562.Google Scholar
Digital Library
- Cristian-Ioan Vasile, Vasumathi Raman, and Sertac Karaman. 2017. Sampling-based synthesis of maximally-satisfying controllers for temporal logic specifications. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3840--3847.Google Scholar
Digital Library
- Marcell J. Vazquez-Chanlatte, Shromona Ghosh, Vasumathi Raman, Alberto Sangiovanni-Vincentelli, and Sanjit A. Seshia. 2018. Generating dominant strategies for continuous two-player zero-sum games. IFAC-PapersOnLine 51, 16 (2018), 7--12.Google Scholar
Cross Ref
- Grady Williams, Nolan Wagener, Brian Goldfain, Paul Drews, James M. Rehg, Byron Boots, and Evangelos A. Theodorou. 2017. Information theoretic MPC for model-based reinforcement learning. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1714--1721.Google Scholar
- Weiming Xiang, Patrick Musau, Ayana A. Wild, Diego Manzanas Lopez, Nathaniel Hamilton, Xiaodong Yang, Joel Rosenfeld, and Taylor T. Johnson. 2018. Verification for machine learning, autonomy, and neural networks survey. arXiv preprint arXiv:1810.01989 (2018).Google Scholar
- Shakiba Yaghoubi and Georgios Fainekos. 2018. Falsification of temporal logic requirements using gradient based local search in space and time. IFAC-PapersOnLine 51, 16 (2018), 103--108.Google Scholar
Cross Ref
- Shakiba Yaghoubi and Georgios Fainekos. 2019. Gray-box adversarial testing for control systems with machine learning components. In Proceedings of the 22Nd ACM International Conference on Hybrid Systems: Computation and Control (HSCC’19). ACM, New York, NY, USA, 179--184. DOI:https://doi.org/10.1145/3302504.3311814Google Scholar
Digital Library
- Tianhao Zhang, Gregory Kahn, Sergey Levine, and Pieter Abbeel. 2016. Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 528--535.Google Scholar
Digital Library
- Siqi Zhou, Mohamed K. Helwa, and Angela P. Schoellig. 2017. Design of deep neural networks as add-on blocks for improving impromptu trajectory tracking. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 5201--5207.Google Scholar
Index Terms
Worst-case Satisfaction of STL Specifications Using Feedforward Neural Network Controllers: A Lagrange Multipliers Approach
Recommendations
Design of Neural Network Controller for Double Inverted Pendulum
MACE '12: Proceedings of the 2012 Third International Conference on Mechanic Automation and Control EngineeringThe double inverted pendulum is a multivariable, nonlinear control system. The neural network controller is studied in the paper. The neural network and double inverted pendulum are introduced firstly. Based on the above, the neural network controller ...
Disturbance rejection and high dynamic quadrotor control based on reinforcement learning and supervised learning
AbstractIn this paper, we design and train a neural network controller for quadrotor attitude control to expand the application of quadrotors in more complex scenarios and challenging tasks. The neural network controller can allow the quadrotor to reject ...
Neural controllers applied in drum type boilers
MMACTEE'10: Proceedings of the 12th WSEAS international conference on Mathematical methods and computational techniques in electrical engineeringIn this paper an Artificial Neural Network is used to control the outlet steam pressure of a boiler accompanying to a PID controller with constant PID parameters. The load will change and acts as the disturbance. The Neural Controller is trained on-line ...






Comments