skip to main content
research-article

A Reinforcement Learning Approach for Interdomain Routing with Link Prices

Authors Info & Claims
Published:09 March 2015Publication History
Skip Abstract Section

Abstract

In today’s Internet, the commercial aspects of routing are gaining importance. Current technology allows Internet Service Providers (ISPs) to renegotiate contracts online to maximize profits. Changing link prices will influence interdomain routing policies that are now driven by monetary aspects as well as global resource and performance optimization. In this article, we consider an interdomain routing game in which the ISP’s action is to set the price for its transit links. Assuming a cheapest path routing scheme, the optimal action is the price setting that yields the highest utility (i.e., profit) and depends both on the network load and the actions of other ISPs. We adapt a continuous and a discrete action learning automaton (LA) to operate in this framework as a tool that can be used by ISP operators to learn optimal price setting. In our model, agents representing different ISPs learn only on the basis of local information and do not need any central coordination or sensitive information exchange. Simulation results show that a single ISP employing LAs is able to learn the optimal price in a stationary environment. By introducing a selective exploration rule, LAs are also able to operate in nonstationary environments. When two ISPs employ LAs, we show that they converge to stable and fair equilibrium strategies.

References

  1. Enda Barrett, Jim Duggan, and Enda Howley. 2014. A parallel framework for Bayesian reinforcement learning. Connection Science 26, 1, 7--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dominique Barth and Loubna Echabbi. 2008. Optimal transit price negotiation: The distributed learning perspective. Journal of Universal Computer Science 14, 5, 745--765.Google ScholarGoogle Scholar
  3. Lucian Buşoniu, Robert Babuška, and Bart De Schutter. 2010. Multi-agent reinforcement learning: An overview. In Innovations in Multi-Agent Systems and Applications. Studies in Computational Intelligence, Vol. 281. Springer, 183--221.Google ScholarGoogle Scholar
  4. Ross Cressman. 2005. Stability of the replicator equation with continuous strategy space. Mathematical Social Sciences 50, 2, 127--147.Google ScholarGoogle ScholarCross RefCross Ref
  5. Pasquale Gurzi, Kris Steenhaut, Ann Nowé, and Peter Vrancx. 2011. Learning a pricing strategy in multi-domain DWDM networks. In Proceedings of the 18th IEEE Workshop on Local and Metropolitan Area Networks. IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  6. Mark N. Howell and Matt C. Best. 2000. On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata. Control Engineering Practice 8, 2, 147--154.Google ScholarGoogle ScholarCross RefCross Ref
  7. Mark N. Howell, Geoff P. Frost, Timothy J. Gordon, and Qing H. Wu. 1997. Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics 7, 3, 263--276.Google ScholarGoogle ScholarCross RefCross Ref
  8. Hansan T. Karaoglu and Murat Yuksel. 2010. Value flows: Inter-domain routing over contract links. In Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM’10). 342--347.Google ScholarGoogle Scholar
  9. Craig Labovitz, Scott Iekel-Johnson, Danny McPherson, Jon Oberheide, and Farnam Jahanian. 2010. Internet inter-domain traffic. ACM SIGCOMM Computer Communication Review 40, 4, 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Krisztina Loja, Jeanos Szigeti, and Tibor Cinkler. 2005. Inter-domain routing in multiprovider optical networks: Game theory and simulations. In Proceedings of Next Generation Internet Networks. IEEE, Los Alamitos, CA, 157--164. DOI: http://dx.doi.org/10.1109/NGI.2005.1431661Google ScholarGoogle ScholarCross RefCross Ref
  11. Jorg Oechssler and Frank Riedel. 2002. On the dynamic foundation of evolutionary stability in continuous models. Journal of Economic Theory 107, 2, 223--252.Google ScholarGoogle ScholarCross RefCross Ref
  12. Liviu Panait and Sean Luke. 2005. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11, 3, 387--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Abdel Rodríguez, Ricardo Grau, and Ann Nowé. 2011. Continuous action reinforcement learning automata—performance and convergence. In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence. 473--478.Google ScholarGoogle Scholar
  14. Abdel Rodriguez, Peter Vrancx, Ricardo Grau, and Ann Nowé. 2012a. Learning approach to coordinate exploration with limited communication in continuous action games. In Proceedings of the AAMAS 2012 Workshop on Adaptive and Learning Agents. 17--23.Google ScholarGoogle Scholar
  15. Abdel Rodriguez, Peter Vrancx, Ricardo Grau, and Ann Nowé. 2012b. An RL approach to common-interest continuous action games. In Proceedings of the 11th International Conference on Adaptive Agents and Multi-Agent Systems (AAMAS). 1401--1402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Abdel Rodriguez, Peter Vrancx, and Ann Nowé. In Press. A reinforcement learning approach to coordinate exploration with limited communication in continuous action games. Knowledge Engineering Review 31, 2. Available at http://ai.vub.ac.be/ALA2012/KER.html.Google ScholarGoogle Scholar
  17. Abdel Rodriguez, Peter Vrancx, Ann Nowe, and Erik Hostens. 2013. Model-free learning of wire winding control. In Proceedings of the 9th Asian Control Conference (ASCC). IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  18. Gireesh Shrimali, Aditya Akella, and Almir Mutapcic. 2010. Cooperative inter-domain traffic engineering using Nash bargaining and decomposition. IEEE/ACM Transactions on Networking 18, 2, 1063--6692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mandayam A. L. Thathachar and P. Shanthi Sastry. 2004. Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mandayam A. L. Thathachar and P. Shanthi Sastry. 1986. Estimator algorithms for learning automata. In Proceedings of the Platinum Jubilee Conference on System Signal Processing.Google ScholarGoogle Scholar
  21. Omkar Tilak and Snehasis Mukhopadhyay. 2011. Partially decentralized reinforcement learning in finite, multi-agent Markov decision process. AI Communications 24, 4, 293--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Vytautas Valancius and Cristian Lumezanu. 2011. How many tiers? Pricing in the Internet transit market. In Proceedings of the ACM SIGCOMM 2011 Conference. 194--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Matthijs Veelen and Peter Spreij. 2008. Evolution in games with a continuous action space. Economic Theory 39, 3, 355--376.Google ScholarGoogle ScholarCross RefCross Ref
  24. Peter Vrancx, Katja Verbeeck, and Ann Nowé. 2008. Decentralized learning in Markov games. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38, 4, 976--981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Richard Wheeler Jr. and Kumpati S. Narendra. 1986. Decentralized learning in finite Markov chains. IEEE Transactions on Automatic Control 31, 6, 519--526.Google ScholarGoogle ScholarCross RefCross Ref
  26. Ian H. Witten. 1977. An adaptive optimal controller for discrete-time Markov environments. Information and Control 34, 4, 286--295.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Reinforcement Learning Approach for Interdomain Routing with Link Prices

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Autonomous and Adaptive Systems
        ACM Transactions on Autonomous and Adaptive Systems  Volume 10, Issue 1
        March 2015
        178 pages
        ISSN:1556-4665
        EISSN:1556-4703
        DOI:10.1145/2744297
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 March 2015
        • Accepted: 1 January 2015
        • Revised: 1 November 2014
        • Received: 1 March 2013
        Published in taas Volume 10, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!