skip to main content
10.1145/2783258.2783373acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

Published:10 August 2015Publication History

ABSTRACT

Despite having various attractive qualities such as high prediction accuracy and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix Factorization has not been widely adopted because of the prohibitive cost of inference. In this paper, we propose a scalable distributed Bayesian matrix factorization algorithm using stochastic gradient MCMC. Our algorithm, based on Distributed Stochastic Gradient Langevin Dynamics, can not only match the prediction accuracy of standard MCMC methods like Gibbs sampling, but at the same time is as fast and simple as stochastic gradient descent. In our experiments, we show that our algorithm can achieve the same level of prediction accuracy as Gibbs sampling an order of magnitude faster. We also show that our method reduces the prediction error as fast as distributed stochastic gradient descent, achieving a 4.1% improvement in RMSE for the Netflix dataset and an 1.8% for the Yahoo music dataset.

Skip Supplemental Material Section

Supplemental Material

p9.m4v

References

  1. R2 - yahoo! music user ratings of songs with artist, album, and genre meta information, v. 1.0 (1.4 gbyte and 1.1 gbyte). http://webscope.sandbox.yahoo.com/.Google ScholarGoogle Scholar
  2. R. Adams, G. Dahl, and I. Murray. Incorporating side information in probabilistic matrix factorization with gaussian processes. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, 2010.Google ScholarGoogle Scholar
  3. S. Ahn, A. Korattikara, and M. Welling. Bayesian posterior sampling via stochastic gradient fisher scoring. In International Conference on Machine Learning, 2012. 17Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Ahn, B. Shahbaba, and M. Welling. Distributed stochastic gradient mcmc. In International Conference on Machine Learning (ICML), 2014.Google ScholarGoogle Scholar
  5. R. Bardenet, A. Doucet, and C. Holmes. Towards scaling up markov chain monte carlo: an adaptive subsampling approach. In International Conference on Machine Learning, 2014.Google ScholarGoogle Scholar
  6. J. Bennett and S. Lanning. The netflix prize. In KDD Cup and Workshop in conjunction with KDD, 2007.Google ScholarGoogle Scholar
  7. J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to numerical computing. CoRR, 2014. http://dblp.uni-trier.de/rec/bib/journals/corr/BezansonEKS14.Google ScholarGoogle Scholar
  8. T. Chen, E. Fox, and C. Guestrin. Stochastic gradient hamiltonian monte carlo. In International Conference on Machine Learning (ICML), 2014.Google ScholarGoogle Scholar
  9. N. Ding, Y. Fang, R. Babbush, C. Chen, R. Skeel, and H. Neven. Bayesian sampling using stochastic gradient thermostats. In Advances in Neural Information Processing Systems (NIPS), 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Dror, N. Koenigstein, Y. Koren, and M. Weimer. The yahoo! music dataset and kdd-cup'11. In Proceedings of KDD-Cup 2011 competition, 2012.Google ScholarGoogle Scholar
  11. S. Duane, A. Kennedy, B. Pendleton, and D. Roweth. Hybrid monte carlo. Physics letters B, 195(2):216--222, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  12. R. Gemulla, E. Nijkamp, P. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Girolami and B. Calderhead. Riemann manifold langevin and hamiltonian monte carlo. Journal of the Royal Statistical Society B, 73 (2):1--37, 2010.Google ScholarGoogle Scholar
  14. K. B. Hall, S. Gilpin, and G. Mann. Mapreduce/bigtable for distributed optimization. In NIPS LCCC Workshop, 2010.Google ScholarGoogle Scholar
  15. A. Korattikara, Y. Chen, and M. Welling. Austerity in mcmc land: Cutting the metropolis-hastings budget. In International Conference on Machine Learning (ICML), 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. In IEEE Computer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In Neural Information Processing Systems, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. In HLT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Mnih and R. Salakhutdinov. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems, 2007.Google ScholarGoogle Scholar
  20. R. Neal. Mcmc using hamiltonian dynamics. In S. Brooks, A. Gelman, G. Jones, and X. Meng, editors, Handbook of Markov Chain Monte Carlo. Chapman&Hall/CRC, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  21. F. Niu, B. Recht, C. Ré, and S. J. Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. arXiv preprint arXiv:1106.5730, 2011.Google ScholarGoogle Scholar
  22. S. Patterson and Y. W. Teh. Stochastic gradient riemannian langevin dynamics on the probability simplex. In Advances in Neural Information Processing Systems, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. Porteous, A. Ascuncion, and M. Welling. Bayesian matrix factorization with side information and dirichlet process mixtures. In AAAI Conference on Artificial Intelligence, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  24. B. Recht and C. Re. Parallel stochastic gradient algorithms for large-scale matrix completion. In Mathematical Programming Computation, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  25. P. Rossky, J. Doll, and H. Friedman. Brownian dynamics as smart monte carlo simulation. In The Journal of Chemical Physics, 1978.Google ScholarGoogle Scholar
  26. R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using markov chain monte carlo. In Proceedings of the 25th International Conference on Machine Learning (ICML), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Teflioudi, F. Makari, and R. Gemulla. Distributed matrix completion. In IEEE 12th International Conference on Data Mining, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient langevin dynamics. In International Conference on Machine Learning (ICML), 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Zhuang, W. S. Chin, Y. C. Juan, and C. J. Lin. A fast parallel sgd for matrix factorizatio in shared memory systems. In Proceedings of the 7th ACM conference on Recommender systems, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Zinkevich, M. Weimer, and A. Smola. Parallelized stochastic gradient descent. In Neural Information Processing Systems , 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
      August 2015
      2378 pages
      ISBN:9781450336642
      DOI:10.1145/2783258

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader