Abstract
In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm \textttGossip\_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that \textttGossip\_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(\max\ \textttpoly (N,M) łog T, \textttpoly (N,M)łog_łambda_2^-1 N\ ) for all N agents, where łambda_2\in(0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of G. We then propose \textttFed\_UCB, a differentially private version of \textttGossip\_UCB, in which the agents preserve ε-differential privacy of their local data while achieving O(\max \\frac\textttpoly (N,M) ε łog^2.5 T, \textttpoly (N,M) (łog_łambda_2^-1 N + łog T) \ ) regret.
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning , Vol. 47, 2--3 (2002), 235--256.Google Scholar
Digital Library
- Ilai Bistritz and Amir Leshem. 2018. Distributed Multi-Player Bandits - a Game of Thrones Approach. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 7222--7232.Google Scholar
- Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2016. Practical secure aggregation for federated learning on user-held data. arXiv preprint arXiv:1611.04482 (2016).Google Scholar
- Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, and Devavrat Shah. 2006. Randomized Gossip Algorithms. IEEE Transactions on Information Theory , Vol. 52, 6 (2006), 2508--2530.Google Scholar
Digital Library
- Semih Cayci, Atilla Eryilmaz, and Rayadurgam Srikant. 2019. Learning to control renewal processes with bandit feedback. Proceedings of the ACM on Measurement and Analysis of Computing Systems , Vol. 3, 2 (2019), 1--32.Google Scholar
Digital Library
- Mithun Chakraborty, Kai Yee Phoebe Chua, Sanmay Das, and Brendan Juba. 2017. Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits.. In IJCAI . 164--170.Google Scholar
- T-H Hubert Chan, Elaine Shi, and Dawn Song. 2011. Private and continual release of statistics. ACM Transactions on Information and System Security (TISSEC) , Vol. 14, 3 (2011), 26.Google Scholar
- Igor Colin, Aurélien Bellet, Joseph Salmon, and Stéphan Clémencc on. 2015. Extending Gossip Algorithms to Distributed Estimation of U-Statistics. In Advances in Neural Information Processing Systems. 271--279.Google Scholar
- Richard Combes, Alexandre Proutière, and Alexandre Fauquette. 2020. Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness. Proceedings of the ACM on Measurement and Analysis of Computing Systems , Vol. 4, 1 (2020), 1--28.Google Scholar
Digital Library
- Alexandros G. Dimakis, Soummya Kar, José M. F. Moura, Michael G. Rabbat, and Anna Scaglione. 2010. Gossip algorithms for distributed signal processing. Proc. IEEE , Vol. 98, 11 (2010), 1847--1864.Google Scholar
Cross Ref
- Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting telemetry data privately. In Advances in Neural Information Processing Systems. 3571--3580.Google Scholar
- Abhimanyu Dubey and Alex Pentland. 2020 a. Cooperative Multi-Agent Bandits with Heavy Tails. arXiv preprint arXiv:2008.06244 (2020).Google Scholar
- Abhimanyu Dubey and Alex Pentland. 2020 b. Differentially-Private Federated Linear Bandits. arXiv preprint arXiv:2010.11425 (2020).Google Scholar
- Abhimanyu Dubey and Alex Pentland. 2020 c. Kernel Methods for Cooperative Multi-Agent Contextual Bandits. arXiv preprint arXiv:2008.06220 (2020).Google Scholar
- Abhimanyu Dubey and Alex Pentland. 2020 d. Private and Byzantine-Proof Cooperative Decision-Making. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems . 357--365.Google Scholar
Digital Library
- Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference . Springer, 265--284.Google Scholar
- Cynthia Dwork, Aaron Roth, et almbox. 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science , Vol. 9, 3--4 (2014), 211--407.Google Scholar
Digital Library
- Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security . 1054--1067.Google Scholar
Digital Library
- István HegedHu s, Gábor Danner, and Márk Jelasity. 2019. Gossip learning as a decentralized alternative to federated learning. In IFIP International Conference on Distributed Applications and Interoperable Systems. Springer, 74--90.Google Scholar
- Ali Jadbabaie, Jie Lin, and A. Stephen Morse. 2003. Coordination of Groups of Mobile Autonomous Agents Using Nearest Neighbor Rules. IEEE Trans. Automat. Control , Vol. 48, 6 (2003), 988--1001.Google Scholar
Cross Ref
- Pooria Joulani, Andras Gyorgy, and Csaba Szepesvári. 2013. Online learning under delayed feedback. In International Conference on Machine Learning . 1453--1461.Google Scholar
- Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et almbox. 2019. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977 (2019).Google Scholar
- Dileep Kalathil, Naumaan Nayyar, and Rahul Jain. 2014. Decentralized learning for multiplayer multiarmed bandits. IEEE Transactions on Information Theory , Vol. 60, 4 (2014), 2331--2345.Google Scholar
Digital Library
- Akshay Kashyap, Tamer Bacsar, and R. Srikant. 2007. Quantized consensus. Automatica , Vol. 43, 7 (2007), 1192--1203.Google Scholar
Digital Library
- David Kempe, Alin Dobra, and Johannes Gehrke. 2003. Gossip-based computation of aggregate information. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. IEEE, 482--491.Google Scholar
Cross Ref
- Jakub Kone?ný, H. Brendan McMahan, Felix X. Yu, Peter Richtarik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. In NIPS Workshop on Private Multi-Party Machine Learning . https://arxiv.org/abs/1610.05492Google Scholar
- Satish Babu Korada, Andrea Montanari, and Sewoong Oh. 2011. Gossip PCA. In Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems . ACM, 209--220.Google Scholar
- Nathan Korda, Balazs Szorenyi, and Shuai Li. 2016. Distributed clustering of linear bandits in peer to peer networks. In International Conference on Machine Learning. 1301--1309.Google Scholar
- Peter Landgren, Vaibhav Srivastava, and Naomi Ehrich Leonard. 2016. Distributed Cooperative Decision-making in Multiarmed Bandits: Frequentist and Bayesian Algorithms. In Proceedings of the 55th IEEE Conference on Decision and Control. 167--172.Google Scholar
Digital Library
- Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661--670.Google Scholar
Digital Library
- Qinbin Li, Zeyi Wen, and Bingsheng He. 2019. Federated learning systems: Vision, hype and reality for data privacy and protection. arXiv preprint arXiv:1907.09693 (2019).Google Scholar
- Tan Li, Linqi Song, and Christina Fragouli. 2020 b. Federated Recommendation System via Differential Privacy. arXiv preprint arXiv:2005.06670 (2020).Google Scholar
- Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. 2020 a. On the Convergence of FedAvg on Non-IID Data. In International Conference on Learning Representations . https://openreview.net/forum?id=HJxNAnVtDSGoogle Scholar
- Ji Liu, Shaoshuai Mou, A. Stephen Morse, Brian D. O. Anderson, and Changbin Yu. 2011. Deterministic gossiping. Proc. IEEE , Vol. 99, 9 (2011), 1505--1524.Google Scholar
Cross Ref
- Keqin Liu and Qing Zhao. 2010. Distributed learning in multi-armed bandit with multiple players. IEEE Transactions on Signal Processing , Vol. 58, 11 (2010), 5667--5681.Google Scholar
Digital Library
- Yang Liu, Ji Liu, and Tamer Bacsar. 2018. Differentially private gossip gradient descent. In 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2777--2782.Google Scholar
Digital Library
- Yang Liu and Mingyan Liu. 2015. An online learning approach to improving the quality of crowd-sourcing. ACM SIGMETRICS Performance Evaluation Review , Vol. 43, 1 (2015), 217--230.Google Scholar
Digital Library
- Palma London, Shai Vardi, and Adam Wierman. 2019. Logarithmic Communication for Distributed Optimization in Multi-Agent Systems. Proceedings of the ACM on Measurement and Analysis of Computing Systems , Vol. 3, 3 (2019), 1--29.Google Scholar
Digital Library
- Udari Madhushani and Naomi Ehrich Leonard. 2021. Heterogeneous Explore-Exploit Strategies on Multi-Star Networks. IEEE Control Systems Letters , Vol. 5, 5 (2021), 1603--1608.Google Scholar
Cross Ref
- Mohammad Malekzadeh, Dimitrios Athanasakis, Hamed Haddadi, and Ben Livshits. 2020. Privacy-Preserving Bandits. In Proceedings of Machine Learning and Systems 2020. 350--362.Google Scholar
- David Mart'inez-Rubio, Varun Kanade, and Patrick Rebeschini. 2019. Decentralized Cooperative Stochastic Bandits. In Advances in Neural Information Processing Systems. 4531--4542.Google Scholar
- Nikita Mishra and Abhradeep Thakurta. 2014. Private Stochastic Multi-arm Bandits: From Theory to Practice.Google Scholar
- Naumaan Nayyar, Dileep Kalathil, and Rahul Jain. 2016. On Regret-optimal Learning in Decentralized Multi-player Multi-armed Bandits. IEEE Transactions on Control of Network Systems , Vol. 5, 1 (2016), 597--606.Google Scholar
Cross Ref
- Reza Olfati-Saber, J. Alex Fax, and Richard M. Murray. 2007. Consensus and Cooperation in Networked Multi-Agent Systems. Proc. IEEE , Vol. 95, 1 (2007), 215--233.Google Scholar
Cross Ref
- Alex Olshevsky and John N. Tsitsiklis. 2009. Convergence speed in distributed consensus and averaging. SIAM Journal on Control and Optimization , Vol. 48, 1 (2009), 33--55.Google Scholar
Digital Library
- Kristiaan Pelckmans and Johan AK Suykens. 2009. Gossip algorithms for computing U-statistics. IFAC Proceedings Volumes , Vol. 42, 20 (2009), 48--53.Google Scholar
Cross Ref
- Joshua Romoff, Nicolas Ballas, Joelle Pineau, Mike Rabbat, et almbox. 2019. Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning. In Advances in Neural Information Processing Systems. 13299--13309.Google Scholar
- Ronitt Rubinfeld, Gil Tamir, Shai Vardi, and Ning Xie. 2011. Fast Local Computation Algorithms. arxiv: 1104.1377 [cs.DS]Google Scholar
- Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2019. Social learning in multi agent multi armed bandits. Proceedings of the ACM on Measurement and Analysis of Computing Systems , Vol. 3, 3 (2019), 1--35.Google Scholar
Digital Library
- Chengshuai Shi and Cong Shen. 2021. Federated Multi-Armed Bandits. In 35th AAAI Conference on Artificial Intelligence .Google Scholar
- Benjamin Sirb and Xiaojing Ye. 2018. Decentralized consensus algorithm with delayed and stochastic gradients. SIAM Journal on Optimization , Vol. 28, 2 (2018), 1232--1254.Google Scholar
Digital Library
- Beata Strack, Jonathan P DeShazo, Chris Gennings, Juan L Olmo, Sebastian Ventura, Krzysztof J Cios, and John N Clore. 2014. Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international , Vol. 2014 (2014).Google Scholar
- Lili Su and Jiaming Xu. 2019. Securing distributed gradient descent in high dimensional statistical learning. Proceedings of the ACM on Measurement and Analysis of Computing Systems , Vol. 3, 1 (2019), 1--41.Google Scholar
Digital Library
- Latanya Sweeney. 2000. Simple demographics often identify people uniquely. Health (San Francisco) , Vol. 671, 2000 (2000), 1--34.Google Scholar
- Balazs Szorenyi, Róbert Busa-Fekete, István Hegedus, Róbert Ormándi, Márk Jelasity, and Balázs Kégl. 2013. Gossip-based distributed stochastic bandit algorithms. In International Conference on Machine Learning. 19--27.Google Scholar
- Aristide CY Tossou and Christos Dimitrakakis. 2015. Differentially private, multi-agent multi-armed bandits. In European Workshop on Reinforcement Learning (EWRL) .Google Scholar
- Aristide CY Tossou and Christos Dimitrakakis. 2016. Algorithms for differentially private multi-armed bandits. In Thirtieth AAAI Conference on Artificial Intelligence .Google Scholar
Digital Library
- Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, and Mingyue Ji. 2020 b. Local Averaging Helps: Hierarchical Federated Learning and Convergence Analysis. arXiv preprint arXiv:2010.12998 (2020).Google Scholar
- Yuanhao Wang, Jiachen Hu, Xiaoyu Chen, and Liwei Wang. 2020 a. Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication. In International Conference on Learning Representations . https://openreview.net/forum?id=SJxZnR4YvBGoogle Scholar
- Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) , Vol. 10, 2 (2019), 1--19.Google Scholar
Digital Library
- Jingxuan Zhu, Romeil Sandhu, and Ji Liu. 2020. A Distributed Algorithm for Sequential Decision Making in Multi-Armed Bandit with Homogeneous Rewards. In Proceedings of the 59th IEEE Conference on Decision and Control. 3078--3083.Google Scholar
Digital Library
Index Terms
Federated Bandit: A Gossiping Approach
Recommendations
Federated Bandit: A Gossiping Approach
SIGMETRICS '21: Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer SystemsWe study Federated Bandit, a decentralized Multi-Armed Bandit (MAB) problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm ...
Federated Bandit: A Gossiping Approach
SIGMETRICS '21We study Federated Bandit, a decentralized Multi-Armed Bandit (MAB) problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm ...
Differentially Private Federated Combinatorial Bandits with Constraints
Machine Learning and Knowledge Discovery in DatabasesAbstractThere is a rapid increase in the cooperative learning paradigm in online learning settings, i.e., federated learning (FL). Unlike most FL settings, there are many situations where the agents are competitive. Each agent would like to learn from ...






Comments