Abstract
In the implementation of deep reinforcement learning (DRL), action persistence strategies are often adopted so agents maintain their actions for a fixed or variable number of steps. The choice of the persistent duration for agent actions usually has notable effects on the performance of reinforcement learning algorithms. Aiming at the research gap of global dynamic optimal action persistence and its application in multi-agent systems, we propose a novel framework: global dynamic action persistence (GLDAP), which achieves global action persistence adaptation for deep reinforcement learning. We introduce a closed-loop method that is used to learn the estimated value and the corresponding policy of each candidate action persistence. Our experiment shows that GLDAP achieves an average of 2.5%~90.7% performance improvement and 3~20 times higher sampling efficiency over several baselines across various single-agent and multi-agent domains. We also validate the ability of GLDAP to determine the optimal action persistence through multiple experiments.
Supplemental Material
Available for Download
- [1] . 2019. Health-informed policy gradients for multi-agent reinforcement learning. Retrieved from https://arxiv.org/abs/1908.01022.Google Scholar
- [2] . 2016. OpenAI Gym. Retrieved from https://arxiv.org/abs/1606.01540.Google Scholar
- [3] . 1993. Transition point dynamic programming. In Advances in Neural Information Processing Systems, Vol. 6. Morgan-Kaufmann.Google Scholar
- [4] . 1994. Optimal Control of Dynamic Systems Through the Reinforcement Learning of Transition Points. Ph. D. Dissertation. University of British Columbia.Google Scholar
- [5] . 2020. Temporally-extended \(\epsilon\)-greedy exploration. Retrieved from https://arxiv.org/abs/2006.01782.Google Scholar
- [6] . 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3389–3396.Google Scholar
Digital Library
- [7] . 2017. Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems. Springer, 66–83.Google Scholar
Cross Ref
- [8] . 2019. On inductive biases in deep reinforcement learning. Retrieved from https://arxiv.org/abs/1907.02908.Google Scholar
- [9] . 2021. An analysis of frame-skipping in reinforcement learning. Retrieved from https://arxiv.org/abs/2102.03718.Google Scholar
- [10] . 2013. Reinforcement Learning Control with Approximation of Time-dependent Agent Dynamics. Texas A&M University.Google Scholar
- [11] . 2013. Learning Motor Skills: From Algorithms to Robot Experiments. Vol. 97. Springer.Google Scholar
- [12] . 2017. Dynamic action repetition for deep reinforcement learning. In AAAI Conference on Artificial Intelligence, Vol. 31.Google Scholar
Cross Ref
- [13] . 2020. Reinforcement learning for control with multiple frequencies. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc. 3254–3264.Google Scholar
- [14] . 2015. Continuous control with deep reinforcement learning. Retrieved from https://arxiv.org/abs/1509.02971.Google Scholar
- [15] . 2018. An efficient deep reinforcement learning model for urban traffic control. Retrieved from https://arxiv.org/abs/1808.01876.Google Scholar
- [16] . 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc.Google Scholar
- [17] . 1979. Introduction to Dynamic Systems; Theory, Models, and Applications.
Technical Report . John Wiley & Sons Chichester.Google Scholar - [18] . 2020. Control frequency adaptation via action persistence in batch reinforcement learning. In 37th International Conference on Machine Learning, Vol. 119. 6862–6873.Google Scholar
- [19] . 2016Asynchronous methods for deep reinforcement learning. In 33rd International Conference on Machine Learning, Vol. 48. 1928–1937.Google Scholar
- [20] . 2013. Playing Atari with deep reinforcement learning. Retrieved from https://arxiv.org/abs/1312.5602.Google Scholar
- [21] . 2018. Emergence of grounded compositional language in multi-agent populations. In AAAI Conference on Artificial Intelligence, Vol. 32.Google Scholar
Cross Ref
- [22] . 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons.Google Scholar
- [23] . 2018. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In 35th International Conference on Machine Learning, Vol. 80. 4295–4304.Google Scholar
- [24] . 2015. Trust region policy optimization. In 32nd International Conference on Machine Learning, Vol. 37. 1889–1897.Google Scholar
- [25] . 2017. Proximal policy optimization algorithms. Retrieved from https://arxiv.org/abs/1707.06347.Google Scholar
- [26] . 2017. Learning to repeat: Fine grained action repetition for deep reinforcement learning. Retrieved from https://arxiv.org/abs/1702.06054.Google Scholar
- [27] . 2014Deterministic policy gradient algorithms. In 31st International Conference on Machine Learning, Vol. 32. 387–395.Google Scholar
- [28] . 1998. Introduction to Reinforcement Learning. Vol. 135. MIT Press, Cambridge, MA.Google Scholar
Digital Library
- [29] . 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 1-2 (1999), 181–211.Google Scholar
- [30] . 2019. Making deep q-learning methods robust to time discretization. In 36th International Conference on Machine Learning, Vol. 97. 6096–6104.Google Scholar
- [31] . 2018. DeepMind control suite. Retrieved from https://arxiv.org/abs/1801.00690.Google Scholar
- [32] . 2021. PettingZoo: Gym for multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc.Google Scholar
- [33] . 2021. COVID-19 epidemic and opening of the schools: Artificial intelligence-based long-term adaptive policy making to control the pandemic diseases. IEEE Access 9 (2021), 68461–68471.Google Scholar
- [34] . 2021. Pharmacological, non-pharmacological policies and mutation: An artificial intelligence based multi-dimensional policy making algorithm for controlling the casualties of the pandemic diseases. IEEE Trans. Pattern Anal. Mach. Intell. 44, 12 (2021), 9477–9488.Google Scholar
- [35] . 2021. A novel exploration-exploitation-based adaptive law for intelligent model-free control approaches. IEEE Trans. Cybern. 53, 1 (2021), 329–337.Google Scholar
- [36] . 2021. TAAC: Temporally abstract actor-critic for continuous control. In Advances in Neural Information Processing Systems, Vol. 34.Google Scholar
Index Terms
GLDAP: Global Dynamic Action Persistence Adaptation for Deep Reinforcement Learning
Recommendations
Conversational Recommender System Using Deep Reinforcement Learning
RecSys '22: Proceedings of the 16th ACM Conference on Recommender SystemsDeep Reinforcement Learning (DRL) uses the best of both Reinforcement Learning and Deep Learning for solving problems which cannot be addressed by them individually. Deep Reinforcement Learning has been used widely for games, robotics etc. Limited work ...
Deep reinforcement learning for multi-agent interaction
Multi-agent systems research in the United KingdomThe development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel ...
An Overview of Deep Reinforcement Learning
CACRE2019: Proceedings of the 2019 4th International Conference on Automation, Control and Robotics EngineeringAs a new machine learning method, deep reinforcement learning has made important progress in various fields of people's production and life since it was proposed. However, there are still many difficulties in function design and other aspects. Therefore,...






Comments