Abstract
This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function that allows for the identification of the probability to end up in failure from a given state. Such a risk function is defined in terms of how far such a state is from the state space known by the learning agent. Probabilistic Policy Reuse is used to safely balance the exploitation of actual learned knowledge, the exploration of new actions, and the request of teacher advice in parts of the state space considered dangerous. Specifically, the π-reuse exploration strategy is used. Using experiments in the helicopter hover task and a business management problem, we show that the π-reuse exploration strategy can be used to completely avoid the visit to undesirable situations while maintaining the performance (in terms of the classical long-term accumulated reward) of the final policy achieved.
- Agnar Aamodt and Enric Plaza. 1994. Case-based reasoning; foundational issues, methodological variations, and system approaches. AI Communications 7, 1 (1994), 39--59. Google Scholar
Digital Library
- Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57, 5 (May 2009), 469--483. Google Scholar
Digital Library
- Fernando Borrajo, Yolanda Bueno, Isidro de Pablo, Begoña Santos, Fernando Fernández, Javier García, et al. 2010. SIMBA: A simulator for business education and research. Decision Support Systems 48, 3 (2010), 498--509. Google Scholar
Digital Library
- Yin Cheng and Weidong Zhang. 2018. Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels. Neurocomputing 272 (2018), 63--73. Google Scholar
Digital Library
- Marc Peter Deisenroth, Gerhard Neumann, and Jan Peters. 2013. A survey on policy search for robotics. Foundations and Trends in Robotics 2, 1-2 (2013), 1--142. Google Scholar
Digital Library
- Fernando Fernández and Manuela Veloso. 2006. Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’06). ACM, New York, NY, 720--727. Google Scholar
Digital Library
- Fernando Fernández and Manuela M. Veloso. 2013. Learning domain structure through probabilistic policy reuse in reinforcement learning. Progress in Artificial Intelligence 2, 1 (2013), 13--27.Google Scholar
Cross Ref
- Javier García, Fernando Borrajo, and Fernando Fernández. 2012. Reinforcement learning for decision-making in a business simulator. International Journal of Information Technology and Decision Making 11, 05 (2012), 935--960.Google Scholar
Cross Ref
- Javier García and Fernando Fernández. 2012. Safe exploration of the state and action spaces in reinforcement learning. Journal of Artificial Intelligence Research 45 (2012), 515--564. Google Scholar
Digital Library
- Peter Geibel and Fritz Wysotzki. 2005. Risk-sensitive reinforcement learning applied to control under constraints. Journal of Artificial Intelligence 24 (2005), 81--108. Google Scholar
Digital Library
- Todd Hester and Peter Stone. 2013. TEXPLORE: Real-time sample-efficient reinforcement learning for robots. Machine Learning 90, 3 (March 2013), 385--429. Google Scholar
Digital Library
- Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. In Proceedings of Advances in Neural Information Processing Systems 29. 4565--4573. Google Scholar
Digital Library
- Bing-Qiang Huang, Guang-Yi Cao, and Min Guo. 2005. Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance. In Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vol. 1. IEEE, Los Alamitos, CA, 85--89.Google Scholar
- Leslie Pack Kaelbling, Michael L. Littman, and Andrew P. Moore. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4 (1996), 237--285. Google Scholar
Digital Library
- Jens Kober and Jan Peters. 2011. Policy search for motor primitives in robotics. Machine Learning 84, 1-2 (2011), 171--203. Google Scholar
Digital Library
- Rogier Koppejan and Shimon Whiteson. 2011. Neuroevolutionary reinforcement learning for generalized control of simulated helicopters. Evolutionary Intelligence 4, 4 (Dec. 2011), 219--241.Google Scholar
Cross Ref
- Manon Legrand. 2017. Deep Reinforcement Learning for Autonomous Vehicle Control Among Human Drivers. Ph.D. Dissertation. Universite Libre de Bruxelles.Google Scholar
- Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. Journal Machine Learning Research 17, 1 (Jan. 2016), 1334--1373. Google Scholar
Digital Library
- J. A. Martin H and J. de Lope. 2009. Ex<a>: An effective algorithm for continuous actions Reinforcement Learning problems. In IProceedings of the 35th Annual Conference of IEEE Industrial Electronics (IECON’09). IEEE, Los Alamitos, CA, 2063--2068.Google Scholar
- José Antonio Martín H. and Javier de Lope. 2009. Learning autonomous helicopter flight with evolutionary reinforcement learning. In Computer Aided Systems Theory—EUROCAST 2009, R. Moreno-Díaz, F. Pichler, and A. Quesada-Arencibia (Eds.). Springer, Berlin, Germany, 75--82.Google Scholar
- Risto Miikkulainen. 2017. Evolution of neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, 450--470. Google Scholar
Digital Library
- Jean-Baptiste Mouret and Konstantinos I. Chatzilygeroudis. 2017. 20 years of reality gap: A few thoughts about simulators in evolutionary robotics. In Companion Material Proceedings of the Genetic and Evolutionary Computation Conference. 1121--1124. Google Scholar
Digital Library
- Andrew Ng, Jin Kim, Michael Jordan, and Shankar Sastry. 2003. Autonomous helicopter flight via reinforcement learning. In Proceedings of the 16th International Conference on Neural Information Processing Systems (NIPS’03). 799--806. Google Scholar
Digital Library
- Jan Peters and Stefan Schaal. 2008. Reinforcement learning of motor skills with policy gradients. Neural Networks 21, 4 (May 2008), 682--697. Google Scholar
Digital Library
- Stephane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 627--635.Google Scholar
- Juan C. Santamaría, Richard S. Sutton, and Ashwin Ram. 1998. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior 6 (1998), 163--218. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.8280. Google Scholar
Digital Library
- Bradly C. Stadie, Sergey Levine, and Pieter Abbeel. 2015. Incentivizing exploration in reinforcement learning with deep predictive models. In Advances in Neural Information Processing Systems. 2750--2759.Google Scholar
- Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Lei Tai, Giuseppe Paolo, and Ming Liu. 2017. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’17). IEEE, Los Alamitos, CA, 31--36.Google Scholar
Cross Ref
- Matthew E. Taylor, Brian Kulis, and Fei Sha. 2011. Metric learning for reinforcement learning agents. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS’11). 777--784. Google Scholar
Digital Library
- Matthew E. Taylor, Halit Bener Suay, and Sonia Chernova. 2011. Integrating reinforcement learning with human demonstrations of varying ability. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’11), Vol. 2. 617--624. Google Scholar
Digital Library
- Lisa Torrey and Matthew E. Taylor. 2012. Help an agent out: Student/teacher learning in sequential decision tasks. In Proceedings of the Adaptive and Learning Agents Workshop (AAMAS’12). 41--48.Google Scholar
- Haddo van Hasselt and Marco Wiering. 2007. Reinforcement learning in continuous action spaces. In Proceedings of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. IEEE Los Alamitos, CA, 272--279.Google Scholar
Cross Ref
Index Terms
Probabilistic Policy Reuse for Safe Reinforcement Learning
Recommendations
Probabilistic policy reuse in a reinforcement learning agent
AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systemsWe contribute Policy Reuse as a technique to improve a reinforcement learning agent with guidance from past learned similar policies. Our method relies on using the past policies as a probabilistic bias where the learning agent faces three choices: the ...
Probabilistic Policy Reuse for inter-task transfer learning
Policy Reuse is a reinforcement learning technique that efficiently learns a new policy by using past similar learned policies. The Policy Reuse learner improves its exploration by probabilistically including the exploitation of those past policies. ...
Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
AbstractReinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is ...








Comments