ABSTRACT
Maximum Margin Matrix Factorization (MMMF) was recently suggested (Srebro et al., 2005) as a convex, infinite dimensional alternative to low-rank approximations and standard factor models. MMMF can be formulated as a semi-definite programming (SDP) and learned using standard SDP solvers. However, current SDP solvers can only handle MMMF problems on matrices of dimensionality up to a few hundred. Here, we investigate a direct gradient-based optimization method for MMMF and demonstrate it on large collaborative prediction problems. We compare against results obtained by Marlin (2004) and find that MMMF substantially outperforms all nine methods he tested.
References
- Azar, Y., Fiat, A., Karlin, A. R., McSherry, F., & Saia, J. (2001). Spectral analysis of data. ACM Symposium on Theory of Computing (pp. 619--626). Google Scholar
Digital Library
- Billsus, D., & Pazzani, M. J. (1998). Learning collaborative information filters. Proc. 15th International Conf. on Machine Learning (pp. 46--54). Morgan Kaufmann, San Francisco, CA. Google Scholar
Digital Library
- Canny, J. (2004). Gap: a factor model for discrete data. SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval (pp. 122--129). Sheffield, United Kingdom: ACM Press. Google Scholar
Digital Library
- Collins, M., Dasgupta, S., & Schapire, R. (2002). A generalization of principal component analysis to the exponential family. Advances in Neural Information Processing Systems 14.Google Scholar
- Fazel, M., Hindi, H., & Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. Proceedings American Control Conference.Google Scholar
Cross Ref
- Hofmann, T.(2004). Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst., 22, 89--115. Google Scholar
Digital Library
- Lee, D., & Seung, H. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788--791.Google Scholar
Cross Ref
- Marlin, B. (2004). Collaborative filtering: A machine learning perspective. Master's thesis, University of Toronto, Computer Science Department.Google Scholar
- Marlin, B., & Zemel, R. S. (2004). The multiple multiplicative factor model for collaborative filtering. Proceedings of the 21st International Conference on Machine Learning. Google Scholar
Digital Library
- Nocedal, J., & Wright, S. J. (1999). Numerical optimization. Springer-Verlag.Google Scholar
- Rennie, J. D. M., & Srebro, N. (2005). Loss functions for preference levels: Regression with discrete ordered labels. Proceedings of the IJCAI Multidisciplinary Workshop on Advances in Preference Handling.Google Scholar
- Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain. http://www.cs.cmu.edu/~jrs/jrspapers.html.Google Scholar
- Srebro, N., & Jaakkola, T. (2003). Weighted low rank approximation. 20th International Conference on Machine Learning.Google Scholar
- Srebro, N., Rennie, J. D. M., & Jaakkola, T. (2005). Maximum margin matrix factorization. Advances In Neural Information Processing Systems 17.Google Scholar
- Srebro, N., & Schraibman, A. (2005). Rank, trace-norm and max-norm. Proceedings of the 18th Annual Conference on Learning Theory. Google Scholar
Digital Library
- Zhang, T., & Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval, 4, 5--31. Google Scholar
Digital Library



Comments