skip to main content
research-article

GLDAP: Global Dynamic Action Persistence Adaptation for Deep Reinforcement Learning

Published:28 May 2023Publication History
Skip Abstract Section

Abstract

In the implementation of deep reinforcement learning (DRL), action persistence strategies are often adopted so agents maintain their actions for a fixed or variable number of steps. The choice of the persistent duration for agent actions usually has notable effects on the performance of reinforcement learning algorithms. Aiming at the research gap of global dynamic optimal action persistence and its application in multi-agent systems, we propose a novel framework: global dynamic action persistence (GLDAP), which achieves global action persistence adaptation for deep reinforcement learning. We introduce a closed-loop method that is used to learn the estimated value and the corresponding policy of each candidate action persistence. Our experiment shows that GLDAP achieves an average of 2.5%~90.7% performance improvement and 3~20 times higher sampling efficiency over several baselines across various single-agent and multi-agent domains. We also validate the ability of GLDAP to determine the optimal action persistence through multiple experiments.

Skip Supplemental Material Section

Supplemental Material

REFERENCES

  1. [1] Allen Ross E., Gupta Jayesh K., Pena Jaime, Zhou Yutai, Bear Javona White, and Kochenderfer Mykel J.. 2019. Health-informed policy gradients for multi-agent reinforcement learning. Retrieved from https://arxiv.org/abs/1908.01022.Google ScholarGoogle Scholar
  2. [2] Brockman Greg, Cheung Vicki, Pettersson Ludwig, Schneider Jonas, Schulman John, Tang Jie, and Zaremba Wojciech. 2016. OpenAI Gym. Retrieved from https://arxiv.org/abs/1606.01540.Google ScholarGoogle Scholar
  3. [3] Buckland Kenneth and Lawrence Peter. 1993. Transition point dynamic programming. In Advances in Neural Information Processing Systems, Vol. 6. Morgan-Kaufmann.Google ScholarGoogle Scholar
  4. [4] Buckland Kenneth M.. 1994. Optimal Control of Dynamic Systems Through the Reinforcement Learning of Transition Points. Ph. D. Dissertation. University of British Columbia.Google ScholarGoogle Scholar
  5. [5] Dabney Will, Ostrovski Georg, and Barreto André. 2020. Temporally-extended \(\epsilon\)-greedy exploration. Retrieved from https://arxiv.org/abs/2006.01782.Google ScholarGoogle Scholar
  6. [6] Gu Shixiang, Holly Ethan, Lillicrap Timothy, and Levine Sergey. 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In IEEE International Conference on Robotics and Automation (ICRA). IEEE, 33893396.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Gupta Jayesh K., Egorov Maxim, and Kochenderfer Mykel. 2017. Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems. Springer, 6683.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Hessel Matteo, Hasselt Hado van, Modayil Joseph, and Silver David. 2019. On inductive biases in deep reinforcement learning. Retrieved from https://arxiv.org/abs/1907.02908.Google ScholarGoogle Scholar
  9. [9] Kalyanakrishnan Shivaram, Aravindan Siddharth, Bagdawat Vishwajeet, Bhatt Varun, Goka Harshith, Gupta Archit, Krishna Kalpesh, and Piratla Vihari. 2021. An analysis of frame-skipping in reinforcement learning. Retrieved from https://arxiv.org/abs/2102.03718.Google ScholarGoogle Scholar
  10. [10] Kirkpatrick Kenton Conrad. 2013. Reinforcement Learning Control with Approximation of Time-dependent Agent Dynamics. Texas A&M University.Google ScholarGoogle Scholar
  11. [11] Kober Jens and Peters Jan. 2013. Learning Motor Skills: From Algorithms to Robot Experiments. Vol. 97. Springer.Google ScholarGoogle Scholar
  12. [12] Lakshminarayanan Aravind, Sharma Sahil, and Ravindran Balaraman. 2017. Dynamic action repetition for deep reinforcement learning. In AAAI Conference on Artificial Intelligence, Vol. 31.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Lee Jongmin, Lee Byung-Jun, and Kim Kee-Eung. 2020. Reinforcement learning for control with multiple frequencies. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc. 32543264.Google ScholarGoogle Scholar
  14. [14] Lillicrap Timothy P., Hunt Jonathan J., Pritzel Alexander, Heess Nicolas, Erez Tom, Tassa Yuval, Silver David, and Wierstra Daan. 2015. Continuous control with deep reinforcement learning. Retrieved from https://arxiv.org/abs/1509.02971.Google ScholarGoogle Scholar
  15. [15] Lin Yilun, Dai Xingyuan, Li Li, and Wang Fei-Yue. 2018. An efficient deep reinforcement learning model for urban traffic control. Retrieved from https://arxiv.org/abs/1808.01876.Google ScholarGoogle Scholar
  16. [16] Lowe Ryan, Wu Yi, Tamar Aviv, Harb Jean, Abbeel OpenAI Pieter, and Mordatch Igor. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc.Google ScholarGoogle Scholar
  17. [17] Luenberger David G.. 1979. Introduction to Dynamic Systems; Theory, Models, and Applications. Technical Report. John Wiley & Sons Chichester.Google ScholarGoogle Scholar
  18. [18] Metelli Alberto Maria, Mazzolini Flavio, Bisi Lorenzo, Sabbioni Luca, and Restelli Marcello. 2020. Control frequency adaptation via action persistence in batch reinforcement learning. In 37th International Conference on Machine Learning, Vol. 119. 68626873.Google ScholarGoogle Scholar
  19. [19] Mnih Volodymyr, Badia Adria Puigdomenech, Mirza Mehdi, Graves Alex, Lillicrap Timothy, Harley Tim, Silver David, and Kavukcuoglu Koray. 2016Asynchronous methods for deep reinforcement learning. In 33rd International Conference on Machine Learning, Vol. 48. 19281937.Google ScholarGoogle Scholar
  20. [20] Mnih Volodymyr, Kavukcuoglu Koray, Silver David, Graves Alex, Antonoglou Ioannis, Wierstra Daan, and Riedmiller Martin. 2013. Playing Atari with deep reinforcement learning. Retrieved from https://arxiv.org/abs/1312.5602.Google ScholarGoogle Scholar
  21. [21] Mordatch Igor and Abbeel Pieter. 2018. Emergence of grounded compositional language in multi-agent populations. In AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Puterman Martin L.. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons.Google ScholarGoogle Scholar
  23. [23] Rashid Tabish, Samvelyan Mikayel, Schroeder Christian, Farquhar Gregory, Foerster Jakob, and Whiteson Shimon. 2018. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In 35th International Conference on Machine Learning, Vol. 80. 42954304.Google ScholarGoogle Scholar
  24. [24] Schulman John, Levine Sergey, Abbeel Pieter, Jordan Michael, and Moritz Philipp. 2015. Trust region policy optimization. In 32nd International Conference on Machine Learning, Vol. 37. 18891897.Google ScholarGoogle Scholar
  25. [25] Schulman John, Wolski Filip, Dhariwal Prafulla, Radford Alec, and Klimov Oleg. 2017. Proximal policy optimization algorithms. Retrieved from https://arxiv.org/abs/1707.06347.Google ScholarGoogle Scholar
  26. [26] Sharma Sahil, Srinivas Aravind, and Ravindran Balaraman. 2017. Learning to repeat: Fine grained action repetition for deep reinforcement learning. Retrieved from https://arxiv.org/abs/1702.06054.Google ScholarGoogle Scholar
  27. [27] Silver David, Lever Guy, Heess Nicolas, Degris Thomas, Wierstra Daan, and Riedmiller Martin. 2014Deterministic policy gradient algorithms. In 31st International Conference on Machine Learning, Vol. 32. 387395.Google ScholarGoogle Scholar
  28. [28] Sutton Richard S., Barto Andrew G., et al. 1998. Introduction to Reinforcement Learning. Vol. 135. MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Sutton Richard S., Precup Doina, and Singh Satinder. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 1-2 (1999), 181211.Google ScholarGoogle Scholar
  30. [30] Tallec Corentin, Blier Léonard, and Ollivier Yann. 2019. Making deep q-learning methods robust to time discretization. In 36th International Conference on Machine Learning, Vol. 97. 60966104.Google ScholarGoogle Scholar
  31. [31] Tassa Yuval, Doron Yotam, Muldal Alistair, Erez Tom, Li Yazhe, Casas Diego de Las, Budden David, Abdolmaleki Abbas, Merel Josh, Lefrancq Andrew, Timothy Lillicrap, and Martin Riedmiller. 2018. DeepMind control suite. Retrieved from https://arxiv.org/abs/1801.00690.Google ScholarGoogle Scholar
  32. [32] Terry Justin K., Black Benjamin, Grammel Nathaniel, Jayakumar Mario, Hari Ananth, Sullivan Ryan, Santos Luis, Dieffendahl Clemens, Horsch Caroline, Perez-Vicente Rodrigo, et al. 2021. PettingZoo: Gym for multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc.Google ScholarGoogle Scholar
  33. [33] Tutsoy Onder. 2021. COVID-19 epidemic and opening of the schools: Artificial intelligence-based long-term adaptive policy making to control the pandemic diseases. IEEE Access 9 (2021), 6846168471.Google ScholarGoogle Scholar
  34. [34] Tutsoy Onder. 2021. Pharmacological, non-pharmacological policies and mutation: An artificial intelligence based multi-dimensional policy making algorithm for controlling the casualties of the pandemic diseases. IEEE Trans. Pattern Anal. Mach. Intell. 44, 12 (2021), 9477–9488.Google ScholarGoogle Scholar
  35. [35] Tutsoy Onder, Barkana Duygun Erol, and Balikci Kemal. 2021. A novel exploration-exploitation-based adaptive law for intelligent model-free control approaches. IEEE Trans. Cybern. 53, 1 (2021), 329–337.Google ScholarGoogle Scholar
  36. [36] Yu Haonan, Xu Wei, and Zhang Haichao. 2021. TAAC: Temporally abstract actor-critic for continuous control. In Advances in Neural Information Processing Systems, Vol. 34.Google ScholarGoogle Scholar

Index Terms

  1. GLDAP: Global Dynamic Action Persistence Adaptation for Deep Reinforcement Learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Autonomous and Adaptive Systems
            ACM Transactions on Autonomous and Adaptive Systems  Volume 18, Issue 2
            June 2023
            139 pages
            ISSN:1556-4665
            EISSN:1556-4703
            DOI:10.1145/3599693
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 28 May 2023
            • Online AM: 3 April 2023
            • Accepted: 29 March 2023
            • Revised: 14 January 2023
            • Received: 17 November 2021
            Published in taas Volume 18, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
          • Article Metrics

            • Downloads (Last 12 months)105
            • Downloads (Last 6 weeks)30

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!