Abstract
In today’s Internet, the commercial aspects of routing are gaining importance. Current technology allows Internet Service Providers (ISPs) to renegotiate contracts online to maximize profits. Changing link prices will influence interdomain routing policies that are now driven by monetary aspects as well as global resource and performance optimization. In this article, we consider an interdomain routing game in which the ISP’s action is to set the price for its transit links. Assuming a cheapest path routing scheme, the optimal action is the price setting that yields the highest utility (i.e., profit) and depends both on the network load and the actions of other ISPs. We adapt a continuous and a discrete action learning automaton (LA) to operate in this framework as a tool that can be used by ISP operators to learn optimal price setting. In our model, agents representing different ISPs learn only on the basis of local information and do not need any central coordination or sensitive information exchange. Simulation results show that a single ISP employing LAs is able to learn the optimal price in a stationary environment. By introducing a selective exploration rule, LAs are also able to operate in nonstationary environments. When two ISPs employ LAs, we show that they converge to stable and fair equilibrium strategies.
- Enda Barrett, Jim Duggan, and Enda Howley. 2014. A parallel framework for Bayesian reinforcement learning. Connection Science 26, 1, 7--23. Google Scholar
Digital Library
- Dominique Barth and Loubna Echabbi. 2008. Optimal transit price negotiation: The distributed learning perspective. Journal of Universal Computer Science 14, 5, 745--765.Google Scholar
- Lucian Buşoniu, Robert Babuška, and Bart De Schutter. 2010. Multi-agent reinforcement learning: An overview. In Innovations in Multi-Agent Systems and Applications. Studies in Computational Intelligence, Vol. 281. Springer, 183--221.Google Scholar
- Ross Cressman. 2005. Stability of the replicator equation with continuous strategy space. Mathematical Social Sciences 50, 2, 127--147.Google Scholar
Cross Ref
- Pasquale Gurzi, Kris Steenhaut, Ann Nowé, and Peter Vrancx. 2011. Learning a pricing strategy in multi-domain DWDM networks. In Proceedings of the 18th IEEE Workshop on Local and Metropolitan Area Networks. IEEE, Los Alamitos, CA, 1--6.Google Scholar
Cross Ref
- Mark N. Howell and Matt C. Best. 2000. On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata. Control Engineering Practice 8, 2, 147--154.Google Scholar
Cross Ref
- Mark N. Howell, Geoff P. Frost, Timothy J. Gordon, and Qing H. Wu. 1997. Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics 7, 3, 263--276.Google Scholar
Cross Ref
- Hansan T. Karaoglu and Murat Yuksel. 2010. Value flows: Inter-domain routing over contract links. In Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM’10). 342--347.Google Scholar
- Craig Labovitz, Scott Iekel-Johnson, Danny McPherson, Jon Oberheide, and Farnam Jahanian. 2010. Internet inter-domain traffic. ACM SIGCOMM Computer Communication Review 40, 4, 75--86. Google Scholar
Digital Library
- Krisztina Loja, Jeanos Szigeti, and Tibor Cinkler. 2005. Inter-domain routing in multiprovider optical networks: Game theory and simulations. In Proceedings of Next Generation Internet Networks. IEEE, Los Alamitos, CA, 157--164. DOI: http://dx.doi.org/10.1109/NGI.2005.1431661Google Scholar
Cross Ref
- Jorg Oechssler and Frank Riedel. 2002. On the dynamic foundation of evolutionary stability in continuous models. Journal of Economic Theory 107, 2, 223--252.Google Scholar
Cross Ref
- Liviu Panait and Sean Luke. 2005. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11, 3, 387--434. Google Scholar
Digital Library
- Abdel Rodríguez, Ricardo Grau, and Ann Nowé. 2011. Continuous action reinforcement learning automata—performance and convergence. In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence. 473--478.Google Scholar
- Abdel Rodriguez, Peter Vrancx, Ricardo Grau, and Ann Nowé. 2012a. Learning approach to coordinate exploration with limited communication in continuous action games. In Proceedings of the AAMAS 2012 Workshop on Adaptive and Learning Agents. 17--23.Google Scholar
- Abdel Rodriguez, Peter Vrancx, Ricardo Grau, and Ann Nowé. 2012b. An RL approach to common-interest continuous action games. In Proceedings of the 11th International Conference on Adaptive Agents and Multi-Agent Systems (AAMAS). 1401--1402. Google Scholar
Digital Library
- Abdel Rodriguez, Peter Vrancx, and Ann Nowé. In Press. A reinforcement learning approach to coordinate exploration with limited communication in continuous action games. Knowledge Engineering Review 31, 2. Available at http://ai.vub.ac.be/ALA2012/KER.html.Google Scholar
- Abdel Rodriguez, Peter Vrancx, Ann Nowe, and Erik Hostens. 2013. Model-free learning of wire winding control. In Proceedings of the 9th Asian Control Conference (ASCC). IEEE, Los Alamitos, CA, 1--6.Google Scholar
Cross Ref
- Gireesh Shrimali, Aditya Akella, and Almir Mutapcic. 2010. Cooperative inter-domain traffic engineering using Nash bargaining and decomposition. IEEE/ACM Transactions on Networking 18, 2, 1063--6692. Google Scholar
Digital Library
- Mandayam A. L. Thathachar and P. Shanthi Sastry. 2004. Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic. Google Scholar
Digital Library
- Mandayam A. L. Thathachar and P. Shanthi Sastry. 1986. Estimator algorithms for learning automata. In Proceedings of the Platinum Jubilee Conference on System Signal Processing.Google Scholar
- Omkar Tilak and Snehasis Mukhopadhyay. 2011. Partially decentralized reinforcement learning in finite, multi-agent Markov decision process. AI Communications 24, 4, 293--309. Google Scholar
Digital Library
- Vytautas Valancius and Cristian Lumezanu. 2011. How many tiers? Pricing in the Internet transit market. In Proceedings of the ACM SIGCOMM 2011 Conference. 194--205. Google Scholar
Digital Library
- Matthijs Veelen and Peter Spreij. 2008. Evolution in games with a continuous action space. Economic Theory 39, 3, 355--376.Google Scholar
Cross Ref
- Peter Vrancx, Katja Verbeeck, and Ann Nowé. 2008. Decentralized learning in Markov games. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38, 4, 976--981. Google Scholar
Digital Library
- Richard Wheeler Jr. and Kumpati S. Narendra. 1986. Decentralized learning in finite Markov chains. IEEE Transactions on Automatic Control 31, 6, 519--526.Google Scholar
Cross Ref
- Ian H. Witten. 1977. An adaptive optimal controller for discrete-time Markov environments. Information and Control 34, 4, 286--295.Google Scholar
Cross Ref
Index Terms
A Reinforcement Learning Approach for Interdomain Routing with Link Prices
Recommendations
Interdomain Routing and Games
We present a game-theoretic model that captures many of the intricacies of interdomain routing in today's Internet. In this model, the strategic agents are source nodes located on a network, who aim to send traffic to a unique destination node. The ...
Incentive-compatible interdomain routing
EC '06: Proceedings of the 7th ACM conference on Electronic commerceThe routing of traffic between Internet domains, or Autonomous Systems (ASes), a task known as interdomain routing, is currently handled by the Border Gateway Protocol (BGP) [17]. Using BGP, autonomous systems can apply semantically rich routing ...
Do prices coordinate markets?
STOC '16: Proceedings of the forty-eighth annual ACM symposium on Theory of ComputingWalrasian equilibrium prices have a remarkable property: they allow each buyer to purchase a bundle of goods that she finds the most desirable, while guaranteeing that the induced allocation over all buyers will globally maximize social welfare. ...






Comments