skip to main content
research-article

A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint

Authors Info & Claims
Published:19 April 2021Publication History
Skip Abstract Section

Abstract

In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher’s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions as advice could be less effective. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraints. In PSAF, each Q-learner can decide when to ask for and share its Q-values. We perform experiments in three typical multi-agent learning problems. The evaluation results indicate that the proposed PSAF approach outperforms existing advising methods under both constrained and unconstrained budgets. Moreover, we analyse the influence of advising actions and sharing Q-values on agent learning.

References

  1. Majid Nili Ahmadabadi and Masoud Asadpour. 2002. Expertness based cooperative Q-learning. In IEEE Trans. Syst. Man Cyber. Part B, Cyber. 32 (2002), 66--76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Hidehisa Akiyama. 2012. Agent2d base code. Retrieved from https://zh.osdn.net/projects/rctools/.Google ScholarGoogle Scholar
  3. Ofra Amir, Ece Kamar, Andrey Kolobov, and Barbara J. Grosz. 2016. Interactive teaching strategies for agent training. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 804--811.Google ScholarGoogle Scholar
  4. Roohollah Amiri, Hani Mehrpouyan, Lex Fridman, Ranjan K. Mallik, Arumugam Nallanathan, and David Matolak. 2018. A machine learning approach for power allocation in HetNets considering QoS. In Proceedings of the IEEE International Conference on Communications. 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ana L. C. Bazzan, Denise de Oliveira, and Bruno Castro da Silva. 2010. Learning in groups of traffic signals. Eng. Applic. Artif. Intell. 23, 4 (2010), 560--568.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tim Brys, Ann Nowé, Daniel Kudenko, and Matthew E. Taylor. 2014. Combining multiple correlated reward and shaping signals by measuring confidence. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. 1687--1693.Google ScholarGoogle Scholar
  7. Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence. 746--752.Google ScholarGoogle Scholar
  8. Jeffery A. Clouse. 1996. On Integrating Apprentice Learning and Reinforcement Learning. PhD thesis. University of Massachusetts.Google ScholarGoogle Scholar
  9. Bryan Cunningham and Yong Cao. 2012. Non-reciprocating sharing methods in cooperative Q-learning environments. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology 2 (2012), 212--219.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Felipe Leno da Silva and Anna Helena Reali Costa. 2019. A survey on transfer learning for multiagent reinforcement learning systems. J. Artif. Intell. Res. 64 (2019), 645--703.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Felipe Leno da Silva, Ruben Glatt, and Anna Helena Reali Costa. 2017. Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th International Conference on Autonomous Agents and MultiAgent Systems. 1100--1108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ethan Duryea, Michael Ganger, and Wei Hu. 2016. Exploring deep reinforcement learning with multi Q-learning. In Intell. Contr. Autom. 07 (2016), 129--144.Google ScholarGoogle ScholarCross RefCross Ref
  13. Anestis Fachantidis, Matthew E. Taylor, and Ioannis P. Vlahavas. 2019. Learning to teach reinforcement learning agents. Mach. Learn. Knowl. Extract. 1, 1 (2019), 21--42.Google ScholarGoogle ScholarCross RefCross Ref
  14. Javier García and Fernando Fernández-Rebollo. 2019. Probabilistic policy reuse for safe reinforcement learning. ACM Trans. Auton. Adapt. Syst. 13, 3 (2019), 14:1--14:24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jayesh K. Gupta, Maxim Egorov, and Mykel J. Kochenderfer. 2017. Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems. 66--83.Google ScholarGoogle Scholar
  16. Jianye Hao, Ho fung Leung, and Zhong Ming. 2015. Multiagent reinforcement social learning toward coordination in cooperative multiagent systems. ACM Trans. Auton. Adapt. Syst. 9, 4 (2015), 20:1--20:20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Matthew Hausknecht, Prannoy Mupparaju, Sandeep Subramanian, Shivaram Kalyanakrishnan, and Peter Stone. 2016. Half field offense: An environment for multiagent learning and ad hoc teamwork. In Proceedings of the AAMAS Adaptive Learning Agents (ALA) Workshop. Retrieved from http://www.cs.utexas.edu/users/ai-lab?hausknecht:aamasws16.Google ScholarGoogle Scholar
  18. Yann-Michaël De Hauwere, Peter Vrancx, and Ann Nowé. 2010. Learning multi-agent state space representations. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems 2 (2010), 715--722.Google ScholarGoogle Scholar
  19. Yann-Michaël De Hauwere, Peter Vrancx, and Ann Nowé. 2011. Solving sparse delayed coordination problems in multi-agent reinforcement learning. In Adaptive and Learning Agents. Springer Berlin, 114--133.Google ScholarGoogle Scholar
  20. Zhang-Wei Hong, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, and Chun-Yi Lee. 2018. A deep policy inference Q-network for multi-agent systems. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 1388--1396.Google ScholarGoogle Scholar
  21. Ercüment Ilhan, Jeremy Gow, and Diego Pérez-Liébana. 2019. Teaching on a budget in multi-agent deep reinforcement learning. In Proceedings of the IEEE Conference on Games. 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hiroaki Kitano, Minoru Asada, Yasuo Kuniyoshi, Itsuki Noda, Eiichi Osawa, and Hitoshi Matsubara. 1997. RoboCup: A challenge problem for AI. In Proceedings of the Robot Soccer World Cup I (RoboCup’97). 1--19.Google ScholarGoogle Scholar
  23. Jelle R. Kok and Nikos Vlassis. 2004. Sparse tabular multiagent Q-learning. In Proceedings of the Machine Learning Conference of Belgium and the Netherlands. 65--71.Google ScholarGoogle Scholar
  24. Guillaume Lample and Devendra Singh Chaplot. 2017. Playing FPS games with deep reinforcement learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2140--2146.Google ScholarGoogle Scholar
  25. Hoang Minh Le, Yisong Yue, Peter Carr, and Patrick Lucey. 2017. Coordinated multi-agent imitation learning. In Proceedings of the 34th International Conference on Machine Learning. 1995--2003.Google ScholarGoogle Scholar
  26. Andrei Marinescu, Ivana Dusparic, and Siobhán Clarke. 2017. Prediction-based multi-agent reinforcement learning in inherently non-stationary environments. ACM Trans. Auton. Adapt. Syst. 12, 2 (2017), 9:1--9:23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Laëtitia Matignon, Guillaume J. Laurent, and Nadine Le Fort-Piat. 2012. Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems. Knowl. Eng. Rev. 27, 1 (2012), 1--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Frans A. Oliehoek and Christopher Amato. 2016. A Concise Introduction to Decentralized POMDPs. (SpringerBriefs in Intelligent Systems).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, and Jonathan P. How. 2019. Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence. 6128--6136.Google ScholarGoogle Scholar
  30. Ariel Rosenfeld, Matthew E. Taylor, and Sarit Kraus. 2017. Speeding up tabular reinforcement learning using state-action similarities. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 1722--1724.Google ScholarGoogle Scholar
  31. Alexander A. Sherstov and Peter Stone. 2005. Function approximation via tile coding: Automating parameter choice. In Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA’05). 194--205.Google ScholarGoogle Scholar
  32. Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction (1st. ed.). The MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ming Tan. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th International Conference on Machine Learning. 330--337.Google ScholarGoogle ScholarCross RefCross Ref
  34. Adam Taylor, Ivana Dusparic, Maxime Guériau, and Siobhán Clarke. 2019. Parallel transfer learning in multi-agent systems: What, when and how to transfer? In International Joint Conference on Neural Networks (IJCNN). 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  35. Adam Taylor, Ivana Dusparic, Edgar Galván-López, Siobhán Clarke, and Vinny Cahill. 2014. Accelerating learning in multi-objective systems through transfer learning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’14). 2298--2305.Google ScholarGoogle ScholarCross RefCross Ref
  36. Matthew E. Taylor, Peter Stone, and Yaxin Liu. 2007. Transfer learning via inter-task mappings for temporal difference learning.J. Mach. Learn. Res. 8 (2007), 2125--2167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Lisa Torrey and Matthew E. Taylor. 2012. Help an agent out: Student/teacher learning in sequential decision tasks. In Proceedings of the Adaptive and Learning Agents Workshop (ALA’12). 41--48.Google ScholarGoogle Scholar
  38. Lisa Torrey and Matthew E. Taylor. 2013. Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems. 1053--1060.Google ScholarGoogle Scholar
  39. Hado van Hasselt. 2010. Double Q-learning. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2613--2621.Google ScholarGoogle Scholar
  40. Christopher J. C. H. Watkins and Peter Dayan. 1992. Technical note: Q-learning. Mach. Learn. 8 (1992), 279--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Chongjie Zhang, Sherief Abdallah, and Victor R. Lesser. 2008. Efficient multi-agent reinforcement learning through automated supervision. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’08). 1365--1370.Google ScholarGoogle Scholar
  42. Zongzhang Zhang, Zhiyuan Pan, and Mykel J. Kochenderfer. 2017. Weighted double Q-learning. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3455--3461.Google ScholarGoogle Scholar
  43. Matthieu Zimmer, Paolo Viappiani, and Paul Weng. 2014. Teacher-student framework: A reinforcement learning approach. In Proceedings of the AAMAS Workshop on Autonomous Robots & Multirobot Systems.Google ScholarGoogle Scholar

Index Terms

  1. A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!