ABSTRACT
Despite having various attractive qualities such as high prediction accuracy and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix Factorization has not been widely adopted because of the prohibitive cost of inference. In this paper, we propose a scalable distributed Bayesian matrix factorization algorithm using stochastic gradient MCMC. Our algorithm, based on Distributed Stochastic Gradient Langevin Dynamics, can not only match the prediction accuracy of standard MCMC methods like Gibbs sampling, but at the same time is as fast and simple as stochastic gradient descent. In our experiments, we show that our algorithm can achieve the same level of prediction accuracy as Gibbs sampling an order of magnitude faster. We also show that our method reduces the prediction error as fast as distributed stochastic gradient descent, achieving a 4.1% improvement in RMSE for the Netflix dataset and an 1.8% for the Yahoo music dataset.
Supplemental Material
- R2 - yahoo! music user ratings of songs with artist, album, and genre meta information, v. 1.0 (1.4 gbyte and 1.1 gbyte). http://webscope.sandbox.yahoo.com/.Google Scholar
- R. Adams, G. Dahl, and I. Murray. Incorporating side information in probabilistic matrix factorization with gaussian processes. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, 2010.Google Scholar
- S. Ahn, A. Korattikara, and M. Welling. Bayesian posterior sampling via stochastic gradient fisher scoring. In International Conference on Machine Learning, 2012. 17Google Scholar
Digital Library
- S. Ahn, B. Shahbaba, and M. Welling. Distributed stochastic gradient mcmc. In International Conference on Machine Learning (ICML), 2014.Google Scholar
- R. Bardenet, A. Doucet, and C. Holmes. Towards scaling up markov chain monte carlo: an adaptive subsampling approach. In International Conference on Machine Learning, 2014.Google Scholar
- J. Bennett and S. Lanning. The netflix prize. In KDD Cup and Workshop in conjunction with KDD, 2007.Google Scholar
- J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to numerical computing. CoRR, 2014. http://dblp.uni-trier.de/rec/bib/journals/corr/BezansonEKS14.Google Scholar
- T. Chen, E. Fox, and C. Guestrin. Stochastic gradient hamiltonian monte carlo. In International Conference on Machine Learning (ICML), 2014.Google Scholar
- N. Ding, Y. Fang, R. Babbush, C. Chen, R. Skeel, and H. Neven. Bayesian sampling using stochastic gradient thermostats. In Advances in Neural Information Processing Systems (NIPS), 2014.Google Scholar
Digital Library
- G. Dror, N. Koenigstein, Y. Koren, and M. Weimer. The yahoo! music dataset and kdd-cup'11. In Proceedings of KDD-Cup 2011 competition, 2012.Google Scholar
- S. Duane, A. Kennedy, B. Pendleton, and D. Roweth. Hybrid monte carlo. Physics letters B, 195(2):216--222, 1987.Google Scholar
Cross Ref
- R. Gemulla, E. Nijkamp, P. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011. Google Scholar
Digital Library
- M. Girolami and B. Calderhead. Riemann manifold langevin and hamiltonian monte carlo. Journal of the Royal Statistical Society B, 73 (2):1--37, 2010.Google Scholar
- K. B. Hall, S. Gilpin, and G. Mann. Mapreduce/bigtable for distributed optimization. In NIPS LCCC Workshop, 2010.Google Scholar
- A. Korattikara, Y. Chen, and M. Welling. Austerity in mcmc land: Cutting the metropolis-hastings budget. In International Conference on Machine Learning (ICML), 2014.Google Scholar
Digital Library
- Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. In IEEE Computer, 2009. Google Scholar
Digital Library
- G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In Neural Information Processing Systems, 2009.Google Scholar
Digital Library
- R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. In HLT, 2010. Google Scholar
Digital Library
- A. Mnih and R. Salakhutdinov. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems, 2007.Google Scholar
- R. Neal. Mcmc using hamiltonian dynamics. In S. Brooks, A. Gelman, G. Jones, and X. Meng, editors, Handbook of Markov Chain Monte Carlo. Chapman&Hall/CRC, 2011.Google Scholar
Cross Ref
- F. Niu, B. Recht, C. Ré, and S. J. Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. arXiv preprint arXiv:1106.5730, 2011.Google Scholar
- S. Patterson and Y. W. Teh. Stochastic gradient riemannian langevin dynamics on the probability simplex. In Advances in Neural Information Processing Systems, 2013.Google Scholar
Digital Library
- I. Porteous, A. Ascuncion, and M. Welling. Bayesian matrix factorization with side information and dirichlet process mixtures. In AAAI Conference on Artificial Intelligence, 2010.Google Scholar
Cross Ref
- B. Recht and C. Re. Parallel stochastic gradient algorithms for large-scale matrix completion. In Mathematical Programming Computation, 2013.Google Scholar
Cross Ref
- P. Rossky, J. Doll, and H. Friedman. Brownian dynamics as smart monte carlo simulation. In The Journal of Chemical Physics, 1978.Google Scholar
- R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using markov chain monte carlo. In Proceedings of the 25th International Conference on Machine Learning (ICML), 2008. Google Scholar
Digital Library
- C. Teflioudi, F. Makari, and R. Gemulla. Distributed matrix completion. In IEEE 12th International Conference on Data Mining, 2012. Google Scholar
Digital Library
- M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient langevin dynamics. In International Conference on Machine Learning (ICML), 2011.Google Scholar
Digital Library
- Y. Zhuang, W. S. Chin, Y. C. Juan, and C. J. Lin. A fast parallel sgd for matrix factorizatio in shared memory systems. In Proceedings of the 7th ACM conference on Recommender systems, 2013. Google Scholar
Digital Library
- M. Zinkevich, M. Weimer, and A. Smola. Parallelized stochastic gradient descent. In Neural Information Processing Systems , 2010.Google Scholar
Digital Library
Index Terms
Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC
Recommendations
Large-scale matrix factorization with distributed stochastic gradient descent
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningWe provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm. We ...
TimeMachine: Timeline Generation for Knowledge-Base Entities
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningWe present a method called TIMEMACHINE to generate a timeline of events and relations for entities in a knowledge base. For example for an actor, such a timeline should show the most important professional and personal milestones and relationships such ...
Bayesian probabilistic matrix factorization using Markov chain Monte Carlo
ICML '08: Proceedings of the 25th international conference on Machine learningLow-rank matrix approximation methods provide one of the simplest and most effective approaches to collaborative filtering. Such models are usually fitted to data by finding a MAP estimate of the model parameters, a procedure that can be performed ...





Comments