Abstract
Image set–based classification has attracted substantial research interest because of its broad applications. Recently, lots of methods based on feature learning or dictionary learning have been developed to solve this problem, and some of them have made gratifying achievements. However, most of them transform the image set into a 2D matrix or use 2D convolutional neural networks (CNNs) for feature learning, so the spatial and temporal information is missing. At the same time, these methods extract features from original images in which there may exist huge intra-class diversity. To explore a possible solution to these issues, we propose a simultaneous image reconstruction with deep learning and feature learning with 3D-CNNs (SIRFL) for image set classification. The proposed SIRFL approach consists of a deep image reconstruction network and a 3D-CNN-based feature learning network. The deep image reconstruction network is used to reduce the diversity of images from the same set, and the feature learning network can effectively retain spatial and temporal information by using 3D-CNNs. Extensive experimental results on five widely used datasets show that our SIRFL approach is a strong competitor for the state-of-the-art image set classification methods.
- Ognjen Arandjelovic, Gregory Shakhnarovich, John Fisher, Roberto Cipolla, and Trevor Darrell. 2005. Face recognition with image sets using manifold density divergence. In Proceedings of CVPR. 581–588. Google Scholar
Digital Library
- Hakan Cevikalp and Bill Triggs. 2010. Face recognition based on image sets. In Proceedings of CVPR. 2567–2573.Google Scholar
- Liang Chen. 2014. Dual linear regression based classification for face cluster recognition. In Proceedings of CVPR. 2673–2680. Google Scholar
Digital Library
- Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. A multi-task deep network for person re-identification. In Proceedings of AAAI. 3988–3994. Google Scholar
Digital Library
- Yi Chen Chen, Vishal M. Patel, P. Jonathon Phillips, and Rama Chellappa. 2015. Dictionary-based face and person recognition from unconstrained video. IEEE Access 3 (2015), 1783–1798.Google Scholar
Cross Ref
- Gong Cheng, Peicheng Zhou, and Junwei Han. 2018. Duplex metric learning for image set classification. IEEE Transactions on Image Processing 27, 1 (2018), 281–292.Google Scholar
Cross Ref
- Qingxiang Feng, Yicong Zhou, and Rushi Lan. 2016. Pairwise linear regression classification for image set retrieval. In Proceedings of CVPR. 4865–4872.Google Scholar
- Xizhan Gao, Quansen Sun, Haitao Xu, and Jianqiang Gao. 2020. Sparse and collaborative representation based kernel pairwise linear regression for image set classification. Expert Systems with Applications 140 (2020), 112886.Google Scholar
- Xizhan Gao, Quansen Sun, Haitao Xu, Dong Wei, and Jianqiang Gao. 2019. Multi-model fusion metric learning for image set classification. Knowledge-Based Systems 164 (2019), 253–264.Google Scholar
- Munawar Hayat, Mohammed Bennamoun, and Senjian An. 2014. Learning non-linear reconstruction models for image set classification. In Proceedings of CVPR. 1915–1922. Google Scholar
Digital Library
- Munawar Hayat, Mohammed Bennamoun, and Senjian An. 2014. Reverse training: An efficient approach for image set classification. In Proceedings of ECCV. 784–799.Google Scholar
- Munawar Hayat, Mohammed Bennamoun, and Senjian An. 2015. Deep reconstruction models for image set classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 4 (2015), 713–727.Google Scholar
Digital Library
- Yiqun Hu, Ajmal S. Mian, and Robyn A. Owens. 2011. Sparse approximated nearest points for image set classification. In Proceedings of CVPR. 121–128. Google Scholar
Digital Library
- Peiguang Jing, Yuting Su, Zhengnan Li, Jing Liu, and Liqiang Nie. 2019. Low-rank regularized tensor discriminant representation for image set classification. Signal Processing 156 (2019), 62–70.Google Scholar
Cross Ref
- Xiao-Yuan Jing, Xinyu Zhang, Xiaoke Zhu, Fei Wu, Xinge You, Yang Gao, Shiguang Shan, and Jingyu Yang. 2021. Multiset feature learning for highly imbalanced data classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1 (2021), 139–156.Google Scholar
Digital Library
- Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Fei Fei Li. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of CVPR. 1725–1732. Google Scholar
Digital Library
- Minyoung Kim, Sanjiv Kumar, Vladimir Pavlovic, and Henry A. Rowley. 2008. Face tracking and recognition with visual constraints in real-world videos. In Proceedings of CVPR. 1–8.Google Scholar
- Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar. 2009. Attribute and simile classifiers for face verification. In Proceedings of ECCV. 365–372.Google Scholar
- Kuang-Chih Lee, Jeffrey Ho, Ming-Hsuan Yang, and David J. Kriegman. 2003. Video-based face recognition using probabilistic appearance manifolds. In Proceedings of CVPR. 313–320. Google Scholar
Digital Library
- Bastian Leibe and Bernt Schiele. 2003. Analyzing appearance and contour based methods for object categorization. In Proceedings of CVPR. 409–415.Google Scholar
- Huihui Li, Yan Zeng, and Ning Yang. 2018. Image reconstruction for compressed sensing based on joint sparse bases and adaptive sampling. Machine Vision and Applications 29, 1 (2018), 145–157. Google Scholar
Digital Library
- Xiaocui Li, Hongzhi Yin, Ke Zhou, and Xiaofang Zhou. 2020. Semi-supervised clustering with deep metric learning and graph embedding. World Wide Web 23, 2 (2020), 781–798.Google Scholar
Cross Ref
- Bo Liu, Liping Jing, Jia Li, Jian Yu, Alex Gittens, and Michael W. Mahoney. 2019. Group collaborative representation for image set classification. International Journal of Computer Vision 127, 2 (2019), 181–206. Google Scholar
Digital Library
- Jiwen Lu, Gang Wang, Weihong Deng, and Pierre Moulin. 2014. Simultaneous feature and dictionary learning for image set based face recognition. In Proceedings of ECCV. 265–280.Google Scholar
- Jiwen Lu, Gang Wang, Weihong Deng, Pierre Moulin, and Jie Zhou. 2015. Multi-manifold deep metric learning for image set classification. In Proceedings of CVPR. 1137–1145.Google Scholar
- Jiwen Lu, Gang Wang, and Jie Zhou. 2017. Simultaneous feature and dictionary learning for image set based face recognition. IEEE Transactions on Image Processing 26, 8 (2017), 4042–4054.Google Scholar
Cross Ref
- Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Proceedings of AAAI. 2793–2799. Google Scholar
Digital Library
- Syed Afaq Ali Shah, Uzair Nadeem, Mohammed Bennamoun, Ferdous Ahmed Sohel, and Roberto Togneri. 2017. Efficient image set classification using linear regression based image reconstruction. In Proceedings of CVPR Workshops. 601–610.Google Scholar
- Xinhang Song, Luis Herranz, and Shuqiang Jiang. 2018. Depth CNNs for RGB-D scene recognition: Learning from scratch better than transferring from RGB-CNNs. arXiv:1801.06797 Google Scholar
Digital Library
- Haoliang Sun, Xiantong Zhen, Yuanjie Zheng, Gongping Yang, Yilong Yin, and Shuo Li. 2017. Learning deep match kernels for image-set classification. In Proceedings of CVPR. 3307–3316.Google Scholar
- Ruiping Wang and Xilin Chen. 2009. Manifold discriminant analysis. In Proceedings of CVPR. 429–436.Google Scholar
- Ruiping Wang, Huimin Guo, Larry S. Davis, and Qionghai Dai. 2012. Covariance discriminative learning: A natural and efficient approach to image set classification. In Proceedings of CVPR. 2496–2503. Google Scholar
Digital Library
- Wen Wang, Ruiping Wang, Zhiwu Huang, Shiguang Shan, and Xilin Chen. 2018. Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets. IEEE Transactions on Image Processing 27, 1 (2018), 151–163.Google Scholar
- Wen Wang, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2017. Prototype discriminative learning for image set classification. IEEE Signal Processing Letters 24, 9 (2017), 1318–1322.Google Scholar
Cross Ref
- Lior Wolf, Tal Hassner, and Itay Maoz. 2011. Face recognition in unconstrained videos with matched background similarity. In Proceedings of CVPR. 529–534. Google Scholar
Digital Library
- Fei Wu, Xiao Yuan Jing, Wangmeng Zuo, Ruiping Wang, and Xiaoke Zhu. 2017. Discriminant tensor dictionary learning with neighbor uncorrelation for image set based classification. In Proceedings of IJCAI. 3069–3075. Google Scholar
Digital Library
- Man Zhang, Ran He, Dong Cao, Zhenan Sun, and Tieniu Tan. 2016. Simultaneous feature and sample reduction for image-set classification. In Proceedings of AAAI, Vol. 16. 1401–1407. Google Scholar
Digital Library
- Xiaoke Zhu, Xiao-Yuan Jing, Fei Wu, Yunhong Wang, Wangmeng Zuo, and Wei-Shi Zheng. 2017. Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image. In Proceedings of AAAI. 4341–4348. Google Scholar
Digital Library
- Xiaoke Zhu, Xiao-Yuan Jing, Fei Wu, Di Wu, Li Cheng, Sen Li, and Ruimin Hu. 2017. Multi-kernel low-rank dictionary pair learning for multiple features based image classification. In Proceedings of AAAI. 2970–2976. Google Scholar
Digital Library
Index Terms
Simultaneous Image Reconstruction and Feature Learning with 3D-CNNs for Image Set–Based Classification
Recommendations
Performance evaluation of deep feature learning for RGB-D image/video classification
Deep Neural Networks for image/video classification have obtained much success in various computer vision applications. Existing deep learning algorithms are widely used on RGB images or video data. Meanwhile, with the development of low-cost RGB-D ...
PolSAR image classification based on multi-scale stacked sparse autoencoder
AbstractRecently, many deep learning methods are applied with the spatial information to learn features for polarimetric synthetic aperture radar (PolSAR) image classification. However, without considering the multi-scale information, the ...
Wavelet Feature Selection for Image Classification
Energy distribution over wavelet subbands is a widely used feature for wavelet packet based texture classification. Due to the overcomplete nature of the wavelet packet decomposition, feature selection is usually applied for a better classification ...






Comments