ABSTRACT
There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images.
- Bell, A. J., & Sejnowski, T. J. (1997). The 'independent components' of natural scenes are edge filters. Vision Research, 37, 3327--3338.Google Scholar
Cross Ref
- Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2006). Greedy layer-wise training of deep networks. Adv. in Neural Information Processing Systems.Google Scholar
- Berg, A. C., Berg, T. L., & Malik, J. (2005). Shape matching and object recognition using low distortion correspondence. IEEE Conference on Computer Vision and Pattern Recognition (pp. 26--33). Google Scholar
Digital Library
- Desjardins, G., & Bengio, Y. (2008). Empirical evaluation of convolutional RBMs for vision (Technical Report).Google Scholar
- Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. CVPR Workshop on Gen.-Model Based Vision. Google Scholar
Digital Library
- Grosse, R., Raina, R., Kwong, H., & Ng, A. (2007). Shift-invariant sparse coding for audio classification. Proceedings of the Conference on Uncertainty in AI.Google Scholar
- Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771--1800. Google Scholar
Digital Library
- Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554. Google Scholar
Digital Library
- Hinton, G. E., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.Google Scholar
- Ito, M., & Komatsu, H. (2004). Representation of angles embedded within contour stimuli in area V2 of macaque monkeys. J. Neurosci., 24, 3313--3324.Google Scholar
Cross Ref
- Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1, 541--551. Google Scholar
Digital Library
- Lee, H., Ekanadham, C., & Ng, A. Y. (2008). Sparse deep belief network model for visual area V2. Advances in Neural Information Processing Systems.Google Scholar
- Lee, T. S., & Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. Journal of the Optical Society of America A, 20, 1434--1448.Google Scholar
Cross Ref
- Mutch, J., & Lowe, D. G. (2006). Multiclass object recognition with sparse, localized features. IEEE Conf. on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607--609.Google Scholar
- Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: Transfer learning from unlabeled data. International Conference on Machine Learning (pp. 759--766). Google Scholar
Digital Library
- Raina, R., Madhavan, A., & Ng, A. Y. (2009). Large-scale deep unsupervised learning using graphics processors. International Conf. on Machine Learning. Google Scholar
Digital Library
- Ranzato, M., Huang, F.-J., Boureau, Y.-L., & LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2006). Efficient learning of sparse representations with an energy-based model. Advances in Neural Information Processing Systems (pp. 1137--1144).Google Scholar
- Taylor, G., Hinton, G. E., & Roweis, S. (2007). Modeling human motion using binary latent variables. Adv. in Neural Information Processing Systems.Google Scholar
- Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. International Conference on Computer Vision.Google Scholar
Cross Ref
- Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. International Conference on Machine Learning. Google Scholar
Digital Library
- Yu, K., Xu, W., & Gong, Y. (2009). Deep learning with kernel regularization for visual recognition. Adv. Neural Information Processing Systems.Google Scholar
- Zhang, H., Berg, A. C., Maire, M., & Malik, J. (2006). SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
Recommendations
Unsupervised learning of hierarchical representations with convolutional deep belief networks
There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks (DBNs); however, scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we ...
ImageNet classification with deep convolutional neural networks
NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% ...
On the quantitative analysis of deep belief networks
ICML '08: Proceedings of the 25th international conference on Machine learningDeep Belief Networks (DBN's) are generative models that contain many layers of hidden variables. Efficient greedy algorithms for learning and approximate inference have allowed these models to be applied successfully in many application domains. The ...




Comments