skip to main content
research-article
Open Access

Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory

Authors Info & Claims
Published:26 March 2019Publication History
Skip Abstract Section

Abstract

We consider multi-armed bandit problems in social groups wherein each individual has bounded memory and shares the common goal of learning the best arm/option. We say an individual learns the best option if eventually (as $t\diverge$) it pulls only the arm with the highest expected reward. While this goal is provably impossible for an isolated individual due to bounded memory, we show that, in social groups, this goal can be achieved easily with the aid of social persuasion (i.e., communication) as long as the communication networks/graphs satisfy some mild conditions. In this work, we model and analyze a type of learning dynamics which are well-observed in social groups. Specifically, under the learning dynamics of interest, an individual sequentially decides on which arm to pull next based on not only its private reward feedback but also the suggestion provided by a randomly chosen neighbor. To deal with the interplay between the randomness in the rewards and in the social interaction, we employ the \em mean-field approximation method. Considering the possibility that the individuals in the networks may not be exchangeable when the communication networks are not cliques, we go beyond the classic mean-field techniques and apply a refined version of mean-field approximation:

Using coupling we show that, if the communication graph is connected and is either regular or has doubly-stochastic degree-weighted adjacency matrix, with probability → 1 as the social group size N → ∞, every individual in the social group learns the best option.

If the minimum degree of the graph diverges as N → ∞, over an arbitrary but given finite time horizon, the sample paths describing the opinion evolutions of the individuals are asymptotically independent. In addition, the proportions of the population with different opinions converge to the unique solution of a system of ODEs. Interestingly, the obtained system of ODEs are invariant to the structures of the communication graphs. In the solution of the obtained ODEs, the proportion of the population holding the correct opinion converges to 1 exponentially fast in time.

Notably, our results hold even if the communication graphs are highly sparse.

References

  1. Auer, P., Cesa-Bianchi, N., and Fischer, P. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2--3 (2002), 235--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bandura, A. Social-learning theory of identificatory processes. Handbook of socialization theory and research 213 (1969), 262.Google ScholarGoogle Scholar
  3. Becchetti, L., Clementi, A., Natale, E., Pasquale, F., and Silvestri, R. Plurality consensus in the gossip model. In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms (2015), Society for Industrial and Applied Mathematics, pp. 371--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Becchetti, L., Clementi, A., Natale, E., Pasquale, F., Silvestri, R., and Trevisan, L. Simple dynamics for plurality consensus. Distributed Computing 30, 4 (2017), 293--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bergemann, D., and Vlimki, J. Learning and Strategic Pricing. Econometrica 64, 5 (1996), 1125--1149.Google ScholarGoogle Scholar
  6. Bhamidi, S., Budhiraja, A., and Wu, R. Weakly interacting particle systems on inhomogeneous random graphs. Stochastic Processes and their Applications (2018).Google ScholarGoogle Scholar
  7. Bramson, M. State space collapse with application to heavy traffic limits for multiclass queueing networks. Queueing Systems 30, 1--2 (1998), 89--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bramson, M., Lu, Y., and Prabhakar, B. Asymptotic independence of queues under randomized load balancing. Queueing Systems 71 (2012), 247--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bubeck, S., Cesa-Bianchi, N., et al. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning 5, 1 (2012), 1--122.Google ScholarGoogle ScholarCross RefCross Ref
  10. Budhiraja, A., Mukherjee, D., and Wu, R. Supermarket model on graphs. arXiv preprint arXiv:1712.07607 (2017).Google ScholarGoogle Scholar
  11. Budhiraja, A., Mukherjee, D., and Wu, R. Supermarket model on graphs. arXiv:1712.07607 (2017).Google ScholarGoogle Scholar
  12. Celis, L. E., Krafft, P. M., and Vishnoi, N. K. A distributed learning dynamics in social groups. In Proceedings of the ACM Symposium on Principles of Distributed Computing (New York, NY, USA, 2017), PODC '17, ACM, pp. 441--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cover, T., and Hellman, M. The two-armed-bandit problem with time-invariant finite memory. IEEE Transactions on Information Theory 16, 2 (March 1970), 185--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cover, T. M. A note on the two-armed bandit problem with finite memory. Information and Control 12, 5 (1968), 371--377.Google ScholarGoogle ScholarCross RefCross Ref
  15. Draief, M., and Vojnović, M. Convergence speed of binary interval consensus. SIAM Journal on control and Optimization 50, 3 (2012), 1087--1109.Google ScholarGoogle Scholar
  16. Eberle, A. Stochastic Analysis. Available onine: http://www.mi.uni-koeln.de/stochana/ws1617/Eberle_StochasticAnalysis2015.pdf, 2015.Google ScholarGoogle Scholar
  17. Gamarnik, D., Tsitsiklis, J. N., and Zubeldia, M. Delay, memory, and messaging tradeoffs in distributed service systems. Stochastic Systems 1, 7 (2017), 1--54.Google ScholarGoogle Scholar
  18. Ghaffari, M., and Parter, M. A polylogarithmic gossip algorithm for plurality consensus. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (2016), ACM, pp. 117--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Graham, C., and Méléard, S. Chaos hypothesis for a system interacting through shared resources. Probab. Theory Relat. Fields 100 (1994), 157--173.Google ScholarGoogle ScholarCross RefCross Ref
  20. Hajek, B. Random processes for engineers. Cambridge University Press, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  21. Kempe, D., Dobra, A., and Gehrke, J. Gossip-based computation of aggregate information. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on (2003), IEEE, pp. 482--491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Knebel, J., Weber, M. F., Krüger, T., and Frey, E. Evolutionary games of condensates in coupled birth--death processes. Nature communications 6 (2015).Google ScholarGoogle Scholar
  23. Kurtz, T. G. Solutions of ordinary differential equations as limits of pure jump markov processes. Journal of applied Probability 7, 1 (1970), 49--58.Google ScholarGoogle Scholar
  24. Kurtz, T. G. Approximation of population processes. SIAM, 1981.Google ScholarGoogle Scholar
  25. Lai, T. L., and Robbins, H. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6, 1 (1985), 4--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mannor, S., and Tsitsiklis, J. N. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research 5, Jun (2004), 623--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mukherjee, D., Borst, S. C., and van Leeuwaarden, J. S. Asymptotically optimal load balancing topologies. Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, 1 (2018), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Nakayama, K., Hisakado, M., and Mori, S. Nash equilibrium of social-learning agents in a restless multiarmed bandit game. Scientific Reports 7 (2017).Google ScholarGoogle Scholar
  29. Pini, G., Brutschy, A., Francesca, G., Dorigo, M., and Birattari, M. Multi-armed bandit formulation of the task partitioning problem in swarm robotics. In ANTS (2012), Springer, pp. 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rendell, L., Boyd, R., Cownden, D., Enquist, M., Eriksson, K., Feldman, M. W., Fogarty, L., Ghirlanda, S., Lillicrap, T., and Laland, K. N. Why copy others? insights from the social learning strategies tournament. Science 328, 5975 (2010), 208--213.Google ScholarGoogle ScholarCross RefCross Ref
  31. Robbins, H. A sequential decision problem with a finite memory. Proceedings of the National Academy of Sciences 42, 12 (1956), 920--923.Google ScholarGoogle ScholarCross RefCross Ref
  32. Ross, S. M. Introduction to probability models. Academic press, 2014.Google ScholarGoogle Scholar
  33. Shahrampour, S., Noshad, M., and Tarokh, V. On sequential elimination algorithms for best-arm identification in multi-armed bandits. IEEE Transactions on Signal Processing 65, 16 (2017), 4281--4292.Google ScholarGoogle ScholarCross RefCross Ref
  34. Shen, W., Wang, J., Jiang, Y.-G., and Zha, H. Portfolio choices with orthogonal bandit learning. In IJCAI (2015), p. 974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shwartz, A., and Weiss, A. Large deviations for performance analysis: queues, communication and computing, vol. 5. CRC Press, 1995.Google ScholarGoogle Scholar
  36. Smith, C. V., and Pyke, R. The robbins-isbell two-armed-bandit problem with finite memory. The Annals of Mathematical Statistics (1965), 1375--1386.Google ScholarGoogle ScholarCross RefCross Ref
  37. Sznitman, A.-S. Topics in propagation of chaos. In Lecture Notes in Mathematics, vol. 1464. Springer Berlin Heidelberg, 1991, pp. 165--251.Google ScholarGoogle Scholar
  38. Tsitsiklis, J. N., and Xu, K. On the power of (even a little) resource pooling. Stochastic Systems 2 (2012), 1--66.Google ScholarGoogle ScholarCross RefCross Ref
  39. Tsitsiklis, J. N., and Xu, K. Flexible queueing architectures. Operations Research 65, 5 (2017), 1398--1413.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wang, W., Harchol-Balter, M., Jiang, H., Scheller-Wolf, A., and Srikant, R. Delay Asymptotics and Bounds for Multi-Task Parallel Jobs. ArXiv preprint: arXiv:1710.00296, 2018.Google ScholarGoogle Scholar
  41. Wormald, N. C. Differential equations for random processes and random graphs. The annals of applied probability (1995), 1217--1235.Google ScholarGoogle Scholar
  42. Xu, J., and Hajek, B. The supermarket game. Stochastic Systems 3, 2 (2013), 405--441.Google ScholarGoogle ScholarCross RefCross Ref
  43. Xu, K., and Yun, S.-Y. Reinforcement with fading memories. In Sigmetrics (2018).Google ScholarGoogle Scholar

Index Terms

  1. Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!