skip to main content
10.1145/3486001.3486244acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article

Towards Accelerating Offline RL based Recommender Systems

Published: 22 October 2021 Publication History

Abstract

Reinforcement learning optimizes an objective function by learning an optimal policy for taking a sequence of actions in an environment. Offline RL is a mechanism to learn policy offline from pre-generated traces of an agent’s interactions in an environment, accelerating an agent’s initial learning phase. Real-time deployment of RL-based recommenders across geographies could mix online and offline RL algorithms to explore new users’ behavior and exploit old knowledge to learn recommendation policy. In such a scenario, RL agents will be distributed and deployed in multiple locations. In this paper, we share our experiences and learnings in accelerating our in-house offline RL-based recommender system. The recommender system employs a mix of Batch Constrained (BCQ) and distributional RL algorithms for building policy models. We present various acceleration techniques for this system, such as operators’ fusion, performance anti-patterns, heterogeneous deployments, and design space of synchronous and asynchronous distributed training over generative and policy models of the algorithm. We have shown that the presented acceleration techniques could speed up the training of the RL agents on A100 GPUs by a factor of 47 × over a naive code (written by the ML practitioner) implementation on A100.

References

[1]
M. Mehdi Afsar, Trafford Crump, and Behrouz H. Far. 2021. Reinforcement learning based recommender systems: A survey. CoRR abs/2101.06286(2021). arxiv:2101.06286https://arxiv.org/abs/2101.06286
[2]
Marc G. Bellemare, Will Dabney, and Rémi Munos. 2017. A Distributional Perspective on Reinforcement Learning. arxiv:1707.06887 [cs.LG]
[3]
Yuwei Fu, Wu Di, and Benoit Boulet. 2020. Batch Reinforcement Learning in the Real World: A Survey. Offline RL Workshop, NeuroIPS(2020).
[4]
Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, and Joelle Pineau. 2019. Benchmarking Batch Deep Reinforcement Learning Algorithms. arxiv:1910.01708 [cs.LG]
[5]
Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-Policy Deep Reinforcement Learning without Exploration. In International Conference on Machine Learning. 2052–2062.
[6]
Diksha Garg, Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2020. Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation. CoRR abs/2012.08984(2020). arxiv:2012.08984https://arxiv.org/abs/2012.08984
[7]
Priyanka Gupta, Diksha Garg, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2019. NISER: Normalized Item and Session Representations with Graph Neural Networks. CoRR abs/1909.04276(2019). arxiv:1909.04276http://arxiv.org/abs/1909.04276
[8]
Shruti Kunde, Amey Pandit, and Rekha Singhal. 2020. Benchmarking performance of RaySGD and Horovod for big data applications. In IEEE International Conference on Big Data, Big Data 2020, Atlanta, GA, USA, December 10-13, 2020. IEEE, 2757–2762. https://doi.org/10.1109/BigData50022.2020.9378470
[9]
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arxiv:2005.01643 [cs.LG]
[10]
Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, and Shixiang Shane Gu. 2020. DEPLOYMENT-EFFICIENT REINFORCEMENT LEARNING VIA MODEL-BASED OFFLINE OPTIMIZATION. Offline RL Workshop, NeuroIPS(2020).
[11]
Gregory P. Meyer. 2020. An Alternative Probabilistic Interpretation of the Huber Loss. arxiv:1911.02088 [stat.ML]
[12]
Louis Monier, Jakub Kmec, Alexandre Laterre, Thomas Pierrot, Valentin Courgeau, Olivier Sigaud, and Karim Beguir. 2020. Offline Reinforcement Learning Hands-On. CoRR abs/2011.14379(2020). arxiv:2011.14379https://arxiv.org/abs/2011.14379
[13]
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA) (OSDI’18). USENIX Association, USA, 561–577.
[14]
Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, and David Silver. 2015. Massively Parallel Methods for Deep Reinforcement Learning. arxiv:1507.04296 [cs.LG]
[15]
Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, and Nando de Freitas. 2020. Hyperparameter Selection for Offline Reinforcement Learning. arxiv:2007.09055 [cs.LG]
[16]
Shoumik Palkar, James Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimajan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk, Saman Amarasinghe, Samuel Madden, and Matei Zaharia. 2018. Evaluating End-to-End Optimization for Data Analytics Applications in Weld. Proc. VLDB Endow. 11, 9 (May 2018), 1002–1015. https://doi.org/10.14778/3213880.3213890
[17]
Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré. 2015. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 2674–2682. https://proceedings.neurips.cc/paper/2015/hash/98986c005e5def2da341b4e0627d4712-Abstract.html
[18]
Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.
[19]
Gina Yuan, Shoumik Palkar, Deepak Narayanan, and Matei Zaharia. 2020. Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 293–306. https://www.usenix.org/conference/atc20/presentation/yuan

Cited By

View all
  • (2022)Pay-as-you-Train: Efficient ways of Serverless Training2022 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E55432.2022.00020(116-125)Online publication date: Sep-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems
October 2021
170 pages
ISBN:9781450385947
DOI:10.1145/3486001
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Distributed Training
  2. GPU
  3. Offline Reinforcement Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AIMLSystems 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Pay-as-you-Train: Efficient ways of Serverless Training2022 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E55432.2022.00020(116-125)Online publication date: Sep-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media