Abstract
Scene classification is a challenging problem. Compared with object images, scene images are more abstract, as they are composed of objects. Object and scene images have different characteristics with different scales and composition structures. How to effectively integrate the local mid-level semantic representations including both object and scene concepts needs to be investigated, which is an important aspect for scene classification. In this article, the idea of a sharing codebook is introduced by organically integrating deep learning, concept feature, and local feature encoding techniques. More specifically, the shared local feature codebook is generated from the combined ImageNet1K and Places365 concepts (Mixed1365) using convolutional neural networks. As the Mixed1365 features cover all the semantic information including both object and scene concepts, we can extract a shared codebook from the Mixed1365 features, which only contain a subset of the whole 1,365 concepts with the same codebook size. The shared codebook can not only provide complementary representations without additional codebook training but also be adaptively extracted toward different scene classification tasks. A method of fusing the encoded features with both the original codebook and the shared codebook is proposed for scene classification. In this way, more comprehensive and representative image features can be generated for classification. Extensive experimentations conducted on two public datasets validate the effectiveness of the proposed method. Besides, some useful observations are also revealed to show the advantage of shared codebook.
- X. Bai, C. Yao, and W. Liu. 2016. Strokelets: A learned multi-scale mid-level representation for scene text recognition. IEEE Trans. Image Process. 25, 6 (Jun. 2016), 2789--2802. Google Scholar
Digital Library
- Alessandro Bergamo and Lorenzo Torresani. 2014. Classemes and other classifier-based features for efficient object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36, 10 (2014), 1988--2001.Google Scholar
Cross Ref
- L. Bo, X. Ren, and D. Fox. 2010. Kernel descriptors for visual recognition. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’10). Google Scholar
Digital Library
- Liefeng Bo and Cristian Sminchisescu. 2009. Efficient match kernel between sets of features for visual recognition. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’10). Google Scholar
Digital Library
- X. Cao, X. Wei, Y. Han, and X. Chen. 2015. An object-level high-order contextual descriptor based on semantic, spatial, and scale cues. IEEE Trans. Cybernet. 45, 7 (Jul. 2015), 1327--1339.Google Scholar
- R. G. Cinbis, J. Verbeek, and C. Schmid. 2012. Image categorization using Fisher kernels of non-iid image models. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 2184--2191. Google Scholar
Digital Library
- Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, and Cédric Bray. 2004. Visual categorization with bags of keypoints. In Proceedings of the European Conference on Computer Vision Workshop on Statistical Learning in Computer Vision (ECCV’04). 1--22.Google Scholar
- Mandar Dixit, Si Chen, Dashan Gao, Nikhil Rasiwasia, and Nuno Vasconcelos. 2015. Scene classification with semantic Fisher vectors. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’15).Google Scholar
Cross Ref
- Mandar D. Dixit and Nuno Vasconcelos. 2016. Object based scene representations using Fisher scores of local subspace projections. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 2811--2819. Google Scholar
Digital Library
- Carl Doersch, Abhinav Gupta, and Alexei A. Efros. 2013. Mid-level visual element discovery as discriminative mode seeking. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’13). 494--502. Google Scholar
Digital Library
- Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of the International Conference on Machine Learning (ICML’14). Google Scholar
Digital Library
- L. Fei-Fei and P. Perona. 2005. A bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). Google Scholar
Digital Library
- Y. Gong, L. Wang, R. Guo, and S. Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of the Annual European Conference on Computer Vision (ECCV’14).Google Scholar
- L. Herranz, S. Jiang, and X. Li. 2016. Scene recognition with CNNs: Objects, scales and dataset bias. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 571--579.Google Scholar
- H. Jegou, M. Douze, C. Schmid, and P. Perez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’10).Google Scholar
- Mayank Juneja, Andrea Vedaldi, C. V. Jawahar, and Andrew Zisserman. 2013. Blocks that shout: Distinctive parts for scene classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’13). Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’12). 1106--1114. Google Scholar
Digital Library
- Roland Kwitt, Nuno Vasconcelos, and Nikhil Rasiwasia. 2012. Scene recognition on the semantic manifold. In Proceedings of the Annual European Conference on Computer Vision (ECCV’12). Google Scholar
Digital Library
- S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). Google Scholar
Digital Library
- L. J. Li, H. Su, E. P. Xing, and L. Fei-Fei. 2010. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’10). Google Scholar
Digital Library
- Liang Li, Shuqiang Jiang, and Qingming Huang. 2012. Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans. Multimedia 14, 5 (2012). Google Scholar
Digital Library
- Li-Jia Li, Hao Su, Yongwhan Lim, and Li Fei-Fei. 2014. Object bank: An object-level image representation for high-level visual recognition. Int. J. Comput. Vision 107, 1 (2014), 20--39. Google Scholar
Digital Library
- Z. Li, J. Zhang, K. Zhang, and Z. Li. 2018. Visual tracking with weighted adaptive local sparse appearance model via spatio-temporal context learning. IEEE Trans. Image Process. 27, 9 (Sep. 2018), 4478--4489.Google Scholar
Cross Ref
- David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2 (2004), 91--110. Google Scholar
Digital Library
- Z. Niu, G. Hua, X. Gao, and Q. Tian. 2012. Context aware topic model for scene recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’12). Google Scholar
Digital Library
- Florent Perronnin and Christopher R. Dance. 2007. Fisher kernels on visual vocabularies for image categorization. In Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’07).Google Scholar
- Florent Perronnin, Jorge Sanchez, and Thomas Mensink. 2010. Improving the Fisher kernel for large-scale image classification. In Proceedings of the European Conference on Computer Vision (ECCV’10). Google Scholar
Digital Library
- A. Quattoni and A. Torralba. 2009. Recognizing indoor scenes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09).Google Scholar
- N. Rasiwasia and N. Vasconcelos. 2007. Bridging the gap: Query by semantic example. IEEE Trans. Multimedia 9, 5 (2007), 923--938. Google Scholar
Digital Library
- Nikhil Rasiwasia and Nuno Vasconcelos. 2009. Holistic context modeling using semantic co-occurrences. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 1889--1895.Google Scholar
Cross Ref
- N. Rasiwasia and N. Vasconcelos. 2012. Holistic context models for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34, 5 (2012), 902--917. Google Scholar
Digital Library
- Nikhil Rasiwasia and Nuno Vasconcelos. 2013. Latent dirichlet allocation models for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2665--2679. Google Scholar
Digital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211--252. Google Scholar
Digital Library
- K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google Scholar
- Xinhang Song, Shuqiang Jiang, and Luis Herranz. 2015. Joint multi-feature spatial context for scene recognition on the semantic manifold. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’15).Google Scholar
- X. Song, S. Jiang, and L. Herranz. 2017. Multi-scale multi-feature context modeling for scene recognition in the semantic manifold. IEEE Trans. Image Process. 26, 6 (Jun. 2017), 2721--2735. Google Scholar
Digital Library
- Antonio Torralba and Aude Oliva. 1999. Semantic organization of scenes using discriminant structural templates. In Proceedings of the International Conference on Computer Vision (ICCV’99). 1253. Google Scholar
Digital Library
- Jan C. van Gemert, Jan-Mark Geusebroek, Cor J. Veenman, and Arnold W. M. Smeulders. 2008. Kernel codebooks for scene categorization. In Proceedings of the 10th European Conference on Computer Vision (ECCV’08). 696--709. Google Scholar
Digital Library
- Julia Vogel and Bernt Schiele. 2004. A Semantic Typicality Measure for Natural Scene Categorization. Springer, Berlin, 195--203.Google Scholar
- Julia Vogel and Bernt Schiele. 2007. Semantic modeling of natural scenes for content-based image retrieval. Int. J. Comput. Vision 72, 2 (Apr, 2007), 133--157. Google Scholar
Digital Library
- Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, T. Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’10).Google Scholar
Cross Ref
- Shuang Wang and Shuqiang Jiang. 2015. INSTRE: A new benchmark for instance-level object retrieval and recognition. ACM Trans. Multimedia Comput. Commun. Appl. 11, 3, Article 37 (Feb. 2015), 21 pages. Google Scholar
Digital Library
- X. Wang and E. Grimson. 2007. Spatial latent dirichlet allocation. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’07). Google Scholar
Digital Library
- Xinggang Wang, Baoyuan Wang, Xiang Bai, Wenyu Liu, and Zhuowen Tu. 2013. Max-margin multiple-instance dictionary learning. Proceedings of the International Conference on Machine Learning (ICML’13), 846--854. Google Scholar
Digital Library
- Ruobing Wu, Baoyuan Wang, Wenping Wang, and Yizhou Yu. 2015. Harvesting discriminative meta objects with deep CNN features for scene classification. In Proceedings of the International Conference on Computer Vision (ICCV’15). Google Scholar
Digital Library
- J. Xiao, J. Hayes, K. Ehringer, A. Olivia, and A. Torralba. 2010. SUN database: Largescale scene recognition from Abbey to Zoo. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’10).Google Scholar
- G. S. Xie, X. Y. Zhang, S. Yan, and C. L. Liu. 2017. Hybrid CNN and dictionary-based models for scene recognition and domain adaptation. IEEE Trans. Circ. Syst. Vid. Technol. PP, 27, 6 (2017), 1263--1274.Google Scholar
Digital Library
- Jianchao Yang, Kai Yu, Yihong Gong, and Thomas S. Huang. 2009. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09).Google Scholar
- Donggeun Yoo, Sunggyun Park, Joon-Young Lee, and In So Kweon. 2015. Multi-scale pyramid pooling for deep convolutional representation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’15) Workshop.Google Scholar
Cross Ref
- Lei Zhang, Xiantong Zhen, and Ling Shao. 2014. Learning object-to-class kernels for scene classification. IEEE Trans. Image Process. 23, 8 (Aug. 2014), 3241--3253.Google Scholar
Cross Ref
- W. Zhang, C. W. Ngo, and X. Cao. 2016. Hyperlink-aware object retrieval. IEEE Trans. Image Process. 25, 9 (Sep. 2016), 4186--4198.Google Scholar
Digital Library
- Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, and Aude Oliva. 2016. Places: An image database for deep scene understanding. arXiv preprint arXiv:1610.02055 (2016).Google Scholar
- Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Annual Conference on Neural Information Processing Systems (NIPS’14), Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). 487--495. Google Scholar
Digital Library
Index Terms
Deep Patch Representations with Shared Codebook for Scene Classification
Recommendations
Scene classification using local and global features with collaborative representation fusion
A scene classification based on collaborative representation fusion is proposed.The complementary nature of local and global spatial features is investigated.Weighted fusion is designed based on residuals from two types of features.Proposed LGF ...
Deep Differential Coding for High-Resolution Remote Sensing Scene Classification
ICIGP '18: Proceedings of the 2018 International Conference on Image and Graphics ProcessingScene classification is one of critical tasks in the interpretation of high-resolution remote sensing (HRRS) imagery. Most of the existing methods focus on learning efficient feature representations. Recently, deep convolutional neural network (CNN) ...
Artificial neural networks based war scene classification using various feature extraction methods: a comparative study
AICI'11: Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part IIIIn this paper we are trying to identify the best feature extraction method for classifying war scene from natural scene using Artificial Neural Networks. Also, we are proposed a new hybrid method for the same. For this purpose two set of image ...






Comments