Abstract
We consider multi-armed bandit problems in social groups wherein each individual has bounded memory and shares the common goal of learning the best arm/option. We say an individual learns the best option if eventually (as $t\diverge$) it pulls only the arm with the highest expected reward. While this goal is provably impossible for an isolated individual due to bounded memory, we show that, in social groups, this goal can be achieved easily with the aid of social persuasion (i.e., communication) as long as the communication networks/graphs satisfy some mild conditions. In this work, we model and analyze a type of learning dynamics which are well-observed in social groups. Specifically, under the learning dynamics of interest, an individual sequentially decides on which arm to pull next based on not only its private reward feedback but also the suggestion provided by a randomly chosen neighbor. To deal with the interplay between the randomness in the rewards and in the social interaction, we employ the \em mean-field approximation method. Considering the possibility that the individuals in the networks may not be exchangeable when the communication networks are not cliques, we go beyond the classic mean-field techniques and apply a refined version of mean-field approximation:
Using coupling we show that, if the communication graph is connected and is either regular or has doubly-stochastic degree-weighted adjacency matrix, with probability → 1 as the social group size N → ∞, every individual in the social group learns the best option.
If the minimum degree of the graph diverges as N → ∞, over an arbitrary but given finite time horizon, the sample paths describing the opinion evolutions of the individuals are asymptotically independent. In addition, the proportions of the population with different opinions converge to the unique solution of a system of ODEs. Interestingly, the obtained system of ODEs are invariant to the structures of the communication graphs. In the solution of the obtained ODEs, the proportion of the population holding the correct opinion converges to 1 exponentially fast in time.
Notably, our results hold even if the communication graphs are highly sparse.
- Auer, P., Cesa-Bianchi, N., and Fischer, P. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2--3 (2002), 235--256. Google Scholar
Digital Library
- Bandura, A. Social-learning theory of identificatory processes. Handbook of socialization theory and research 213 (1969), 262.Google Scholar
- Becchetti, L., Clementi, A., Natale, E., Pasquale, F., and Silvestri, R. Plurality consensus in the gossip model. In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms (2015), Society for Industrial and Applied Mathematics, pp. 371--390. Google Scholar
Digital Library
- Becchetti, L., Clementi, A., Natale, E., Pasquale, F., Silvestri, R., and Trevisan, L. Simple dynamics for plurality consensus. Distributed Computing 30, 4 (2017), 293--306. Google Scholar
Digital Library
- Bergemann, D., and Vlimki, J. Learning and Strategic Pricing. Econometrica 64, 5 (1996), 1125--1149.Google Scholar
- Bhamidi, S., Budhiraja, A., and Wu, R. Weakly interacting particle systems on inhomogeneous random graphs. Stochastic Processes and their Applications (2018).Google Scholar
- Bramson, M. State space collapse with application to heavy traffic limits for multiclass queueing networks. Queueing Systems 30, 1--2 (1998), 89--140. Google Scholar
Digital Library
- Bramson, M., Lu, Y., and Prabhakar, B. Asymptotic independence of queues under randomized load balancing. Queueing Systems 71 (2012), 247--292. Google Scholar
Digital Library
- Bubeck, S., Cesa-Bianchi, N., et al. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning 5, 1 (2012), 1--122.Google Scholar
Cross Ref
- Budhiraja, A., Mukherjee, D., and Wu, R. Supermarket model on graphs. arXiv preprint arXiv:1712.07607 (2017).Google Scholar
- Budhiraja, A., Mukherjee, D., and Wu, R. Supermarket model on graphs. arXiv:1712.07607 (2017).Google Scholar
- Celis, L. E., Krafft, P. M., and Vishnoi, N. K. A distributed learning dynamics in social groups. In Proceedings of the ACM Symposium on Principles of Distributed Computing (New York, NY, USA, 2017), PODC '17, ACM, pp. 441--450. Google Scholar
Digital Library
- Cover, T., and Hellman, M. The two-armed-bandit problem with time-invariant finite memory. IEEE Transactions on Information Theory 16, 2 (March 1970), 185--195. Google Scholar
Digital Library
- Cover, T. M. A note on the two-armed bandit problem with finite memory. Information and Control 12, 5 (1968), 371--377.Google Scholar
Cross Ref
- Draief, M., and Vojnović, M. Convergence speed of binary interval consensus. SIAM Journal on control and Optimization 50, 3 (2012), 1087--1109.Google Scholar
- Eberle, A. Stochastic Analysis. Available onine: http://www.mi.uni-koeln.de/stochana/ws1617/Eberle_StochasticAnalysis2015.pdf, 2015.Google Scholar
- Gamarnik, D., Tsitsiklis, J. N., and Zubeldia, M. Delay, memory, and messaging tradeoffs in distributed service systems. Stochastic Systems 1, 7 (2017), 1--54.Google Scholar
- Ghaffari, M., and Parter, M. A polylogarithmic gossip algorithm for plurality consensus. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (2016), ACM, pp. 117--126. Google Scholar
Digital Library
- Graham, C., and Méléard, S. Chaos hypothesis for a system interacting through shared resources. Probab. Theory Relat. Fields 100 (1994), 157--173.Google Scholar
Cross Ref
- Hajek, B. Random processes for engineers. Cambridge University Press, 2015.Google Scholar
Cross Ref
- Kempe, D., Dobra, A., and Gehrke, J. Gossip-based computation of aggregate information. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on (2003), IEEE, pp. 482--491. Google Scholar
Digital Library
- Knebel, J., Weber, M. F., Krüger, T., and Frey, E. Evolutionary games of condensates in coupled birth--death processes. Nature communications 6 (2015).Google Scholar
- Kurtz, T. G. Solutions of ordinary differential equations as limits of pure jump markov processes. Journal of applied Probability 7, 1 (1970), 49--58.Google Scholar
- Kurtz, T. G. Approximation of population processes. SIAM, 1981.Google Scholar
- Lai, T. L., and Robbins, H. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6, 1 (1985), 4--22. Google Scholar
Digital Library
- Mannor, S., and Tsitsiklis, J. N. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research 5, Jun (2004), 623--648. Google Scholar
Digital Library
- Mukherjee, D., Borst, S. C., and van Leeuwaarden, J. S. Asymptotically optimal load balancing topologies. Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, 1 (2018), 14. Google Scholar
Digital Library
- Nakayama, K., Hisakado, M., and Mori, S. Nash equilibrium of social-learning agents in a restless multiarmed bandit game. Scientific Reports 7 (2017).Google Scholar
- Pini, G., Brutschy, A., Francesca, G., Dorigo, M., and Birattari, M. Multi-armed bandit formulation of the task partitioning problem in swarm robotics. In ANTS (2012), Springer, pp. 109--120. Google Scholar
Digital Library
- Rendell, L., Boyd, R., Cownden, D., Enquist, M., Eriksson, K., Feldman, M. W., Fogarty, L., Ghirlanda, S., Lillicrap, T., and Laland, K. N. Why copy others? insights from the social learning strategies tournament. Science 328, 5975 (2010), 208--213.Google Scholar
Cross Ref
- Robbins, H. A sequential decision problem with a finite memory. Proceedings of the National Academy of Sciences 42, 12 (1956), 920--923.Google Scholar
Cross Ref
- Ross, S. M. Introduction to probability models. Academic press, 2014.Google Scholar
- Shahrampour, S., Noshad, M., and Tarokh, V. On sequential elimination algorithms for best-arm identification in multi-armed bandits. IEEE Transactions on Signal Processing 65, 16 (2017), 4281--4292.Google Scholar
Cross Ref
- Shen, W., Wang, J., Jiang, Y.-G., and Zha, H. Portfolio choices with orthogonal bandit learning. In IJCAI (2015), p. 974. Google Scholar
Digital Library
- Shwartz, A., and Weiss, A. Large deviations for performance analysis: queues, communication and computing, vol. 5. CRC Press, 1995.Google Scholar
- Smith, C. V., and Pyke, R. The robbins-isbell two-armed-bandit problem with finite memory. The Annals of Mathematical Statistics (1965), 1375--1386.Google Scholar
Cross Ref
- Sznitman, A.-S. Topics in propagation of chaos. In Lecture Notes in Mathematics, vol. 1464. Springer Berlin Heidelberg, 1991, pp. 165--251.Google Scholar
- Tsitsiklis, J. N., and Xu, K. On the power of (even a little) resource pooling. Stochastic Systems 2 (2012), 1--66.Google Scholar
Cross Ref
- Tsitsiklis, J. N., and Xu, K. Flexible queueing architectures. Operations Research 65, 5 (2017), 1398--1413.Google Scholar
Digital Library
- Wang, W., Harchol-Balter, M., Jiang, H., Scheller-Wolf, A., and Srikant, R. Delay Asymptotics and Bounds for Multi-Task Parallel Jobs. ArXiv preprint: arXiv:1710.00296, 2018.Google Scholar
- Wormald, N. C. Differential equations for random processes and random graphs. The annals of applied probability (1995), 1217--1235.Google Scholar
- Xu, J., and Hajek, B. The supermarket game. Stochastic Systems 3, 2 (2013), 405--441.Google Scholar
Cross Ref
- Xu, K., and Yun, S.-Y. Reinforcement with fading memories. In Sigmetrics (2018).Google Scholar
Index Terms
Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory
Recommendations
Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory
SIGMETRICS '19: Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer SystemsWe consider multi-armed bandit problems in social groups wherein each individual has bounded memory and shares the common goal of learning the best arm/option. We say an individual learns the best option if eventually (as $t\diverge$) it pulls only the ...
Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory
We consider multi-armed bandit problems in social groups wherein each individual has bounded memory and shares the common goal of learning the best arm/option. We say an individual learns the best option if eventually (as t - ∞) it pulls only the arm ...
Maximal and maximum dissociation sets in general and triangle-free graphs
Highlights- Decide the maximum number of the maximal and maximum dissociation set in general graphs.
AbstractIn a graph G, a subset of vertices is a dissociation set if it induces a subgraph with maximum degree at most 1. A maximal dissociation set of G is a dissociation set which is not a proper subset of any other dissociation sets. A ...






Comments