Abstract
We present a novel fine-grained image recognition framework using user click data, which can bridge the semantic gap in distinguishing categories that are similar in visual. As query set in click data is usually large-scale and redundant, we first propose a click-feature-based query-merging approach to merge queries with similar semantics and construct a compact click feature. Afterward, we utilize this compact click feature and convolutional neural network (CNN)-based deep visual feature to jointly represent an image. Finally, with the combined feature, we employ the metriclearning-based template-matching scheme for efficient recognition. Considering the heavy noise in the training data, we introduce a reliability variable to characterize the image reliability, and propose a weakly-supervised metric and template leaning with smooth assumption and click prior (WMTLSC) method to jointly learn the distance metric, object templates, and image reliability. Extensive experiments are conducted on a public Clickture-Dog dataset and our newly established Clickture-Bird dataset. It is shown that the click-data-based query merging helps generating a highly compact (the dimension is reduced to 0.9%) and dense click feature for images, which greatly improves the computational efficiency. Also, introducing this click feature into CNN feature further boosts the recognition accuracy. The proposed framework performs much better than previous state-of-the-arts in fine-grained recognition tasks.
- Yalong Bai, Kuiyuan Yang, Wei Yu, Chang Xu, Wei-Ying Ma, and Tiejun Zhao. 2015. Automatic image dataset construction from click-through logs using deep neural network. In ACM Conference on Multimedia Conference. ACM, 441--450. Google Scholar
Digital Library
- T. Berg, Jiongxin Liu, Seung Woo Lee, M. L. Alexander, D. W. Jacobs, and P. N. Belhumeur. 2014. Birdsnap: Large-scale fine-grained visual categorization of birds. In IEEE Conference on Computer Vision and Pattern Recognition. 2019--2026. Google Scholar
Digital Library
- H. Zhang, X. Shang, H. Luan, et al. 2016. Learning from collective intelligence: Feature learning using social images and tags. ACM Transactions on Multimedia Computing Communications and Applications 13, 1 (2016), 1. Google Scholar
Digital Library
- Rudi L. Cilibrasi and Paul Vitanyi. 2007. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19, 3 (2007), 370--383. Google Scholar
Digital Library
- Wu Feng and Dong Liu. 2017. Fine-Grained Image Recognition from Click-Through Logs Using Deep Siamese Network. Springer International Publishing.Google Scholar
- Shenghua Gao, Ivor Wai-Hung Tsang, and Yi Ma. 2014. Learning category-specific dictionary and shared dictionary for fine-grained image categorization. IEEE Transactions on Image Processing 23, 2 (Feb. 2014), 623--634. Google Scholar
Digital Library
- E. Gavves, T. Mensink, T. Tommasi, C. G. M. Snoek, and T. Tuytelaars. 2015. Active transfer learning with zero-shot priors: Reusing past datasets for future tasks. In IEEE International Conference on Computer Vision. 2731--2739. Google Scholar
Digital Library
- Xian-Sheng Hua, Linjun Yang, Jingdong Wang, Jing Wang, Ming Ye, Kuansan Wang, Yong Rui, and Jin Li. 2013. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In ACM International Conference on Multimedia. ACM, 243--252. Google Scholar
Digital Library
- Junqi Jin, Kun Fu, and Changshui Zhang. 2014. Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Transactions on Intelligent Transportation Systems 15, 5 (2014), 1991--2000.Google Scholar
Cross Ref
- Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. 2011. Novel dataset for fine-grained image categorization. In 1st Workshop on IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Zhenzhong Kuang, Jun Yu, Zongmin Li, Baopeng Zhang, and Jianping Fan. 2018. Integrating multi-level deep learning and concept ontology for large-scale visual recognition. Pattern Recognition 78 (2018), 198--214. Google Scholar
Digital Library
- Chenghua Li, Qiang Song, Yuhang Wang, Hang Song, Qi Kang, Jian Cheng, and Hanqing Lu. 2016. Learning to recognition from bing clickture data. In IEEE International Conference on Multimedia and Expo. 1--4.Google Scholar
- Brian McFee and Gert R. Lanckriet. 2010. Metric learning to rank. In International Conference on Machine Learning. 775--782. Google Scholar
Digital Library
- L. Meng, R. Huang, and J. Gu. 2013. A review of semantic similarity measures in WordNet. International Journal of Hybrid Information Technology 6 (2013).Google Scholar
- Qi Qian, Rong Jin, Shenghuo Zhu, and Yuanqing Lin. 2015. Fine-grained visual categorization via multi-stage metric learning. In IEEE Conference on Computer Vision and Pattern Recognition. 3716--3724.Google Scholar
Cross Ref
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. Google Scholar
Digital Library
- Ioannis Sarafis, Christos Diou, and Anastasios Delopoulos. 2015. Building effective SVM concept detectors from clickthrough data for large-scale image retrieval. International Journal of Multimedia Information Retrieval 4, 2 (2015), 129--142.Google Scholar
Cross Ref
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Computer Science (2014).Google Scholar
- Qiang Song, Sixie Yu, Cong Leng, JiaXiang Wu, Qinghao Hu, and Jian Cheng. 2015. Learning deep features for MSR-bing information retrieval challenge. In ACM Conference on Multimedia Conference. ACM, 169--172. Google Scholar
Digital Library
- Min Tan, Zhenfang Hu, Baoyuan Wang, Jieyi Zhao, and Yueming Wang. 2016. Robust object recognition via weakly supervised metric and template learning. Neurocomputing 101 (2016), 96--107. Google Scholar
Digital Library
- Min Tan, Gang Pan, Yueming Wang, Yuting Zhang, and Zhaohui Wu. 2014. L1-norm latent SVM for compact features in object detection. Neurocomputing 139, 0 (2014), 56--64. Google Scholar
Digital Library
- M. Tan, B. Wang, Z. Wu, J. Wang, and G. Pan. 2016. Weakly supervised metric learning for traffic sign recognition in a LIDAR-equipped vehicle. IEEE Transactions on Intelligent Transportation Systems 17, 5 (2016), 1415--1427. Google Scholar
Digital Library
- M. Tan, Y. Wang, and G. Pan. 2012. Feature reduction for efficient object detection via L1-norm latent SVM. In Intelligent Science and Intelligent Data Engineering. Google Scholar
Digital Library
- Min Tan, Jun Yu, Qingming Huang, and Weichen Wu. 2018. Click data guided query modeling with click propagation and sparse coding. Multimedia Tools and Applications 3 (2018), 1--14.Google Scholar
- Min Tan, Jun Yu, Guangjian Zheng, Weichen Wu, and Kejia Sun. 2016. Deep neural network boosted large scale image recognition using user click data. In International Conference on Internet Multimedia Computing and Service. 118--121. Google Scholar
Digital Library
- Jinhui Tang, Xiangbo Shu, Zechao Li, Guo Jun Qi, and Jingdong Wang. 2016. Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Transactions on Multimedia Computing Communications and Applications 12, 4s (2016), 68. Google Scholar
Digital Library
- Aruni RoyChowdhury Tsung-Yu Lin and Subhransu Maji. 2015. Bilinear CNN models for fine-grained visual recognition. In IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- A. Vedaldi, S. Mahendran, S. Tsogkas, S. Maji, R. Girshick, J. Kannala, E. Rahtu, I. Kokkinos, M. B. Blaschko, D. Weiss, B. Taskar, K. Simonyan, N. Saphra, and S. Mohamed. 2014. Understanding objects in detail with fine-grained attributes. In IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The Caltech-UCSD Birds200-2011 dataset. California Institute of Technology.Google Scholar
- Kilian Q. Weinberger and Lawrence K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10 (2009), 207--244. Google Scholar
Digital Library
- Guotian Xie, Kuiyuan Yang, Yalong Bai, Min Shang, Yong Rui, and Jianhuang Lai. 2016. Improve dog recognition by mining more information from both click-through logs and pre-trained models. In IEEE International Conference on Multimedia and Expo Workshops. 1--4.Google Scholar
- X. Yang, T. Mei, Y. Zhang, J. Liu, and S. Satoh. 2016. Web image search re-ranking with click-based similarity and typicality. IEEE Transactions on Image Processing 25, 10 (2016), 4617--4630. Google Scholar
Digital Library
- Xiaoshan Yang, Tianzhu Zhang, and Changsheng Xu. 2016. Semantic feature mining for video event understanding. ACM Transactions on Multimedia Computing Communications and Applications 12, 4 (2016), 55. Google Scholar
Digital Library
- Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, and Ming Hsuan Yang. 2015. Boosted multifeature learning for cross-domain transfer. ACM Transactions on Multimedia Computing Communications and Applications 11, 3 (2015), 35. Google Scholar
Digital Library
- Xiaopeng Yang, Yongdong Zhang, Ting Yao, Chong-Wah Ngo, and Tao Mei. 2015. Click-boosting multi-modality graph-based reranking for image search. Multimedia Systems 21, 2 (2015), 217--227. Google Scholar
Digital Library
- Yifang Yin, Zhijie Shen, Luming Zhang, and Roger Zimmermann. 2015. Spatial-temporal tag mining for automatic geospatial video annotation. ACM Transactions on Multimedia Computing Communications and Applications 11, 2 (2015), 1--21. Google Scholar
Digital Library
- Jun Yu, Yong Rui, and Bo Chen. 2013. Exploiting click constraints and multi-view features for image re-ranking. IEEE Transactions on Multimedia 16, 1 (2013), 159--168.Google Scholar
Cross Ref
- Jun Yu, Yong Rui, and Dacheng Tao. 2014. Click prediction for web image reranking using multimodal sparse coding. IEEE Transactions on Image Processing 23, 5 (2014), 2019--2032.Google Scholar
Cross Ref
- J. Yu, D. Tao, M. Wang, and Y. Rui. 2015. Learning to rank using user clicks and visual features for image retrieval. IEEE Transactions on Cybernetics 45, 4 (April 2015), 767--779.Google Scholar
Cross Ref
- Jun Yu, Xiaokang Yang, Fei Gao, and Dacheng Tao. 2016. Deep multimodal distance metric learning using click constraints for image ranking. IEEE Transactions on Cybernetics (2016).Google Scholar
- Hanwang Zhang, Zheng Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao, and Tat Seng Chua. 2014. Attribute-augmented semantic hierarchy: Towards a unified framework for content-based image retrieval. ACM Transactions on Multimedia Computing Communications and Applications 11, 1s (2014), 1--21. Google Scholar
Digital Library
- Luming Zhang, Xuelong Li, Liqiang Nie, Yi Yang, and Yingjie Xia. 2016. Weakly supervised human fixations prediction. IEEE Transactions on Cybernetics 46, 1 (2016), 258.Google Scholar
Cross Ref
- Ning Zhang, Ryan Farrell, Forrest Iandola, and Trevor Darrell. 2013. Deformable part descriptors for fine-grained recognition and attribute prediction. In IEEE International Conference on Computer Vision. 729--736. Google Scholar
Digital Library
- Ning Zhang, Manohar Paluri, Marc’Aurelio Ranzato, Trevor Darrell, and Lubomir Bourdev. 2014. PANDA: Pose aligned networks for deep attribute modeling. In IEEE Computer Vision and Pattern Recognition. 1637--1644. Google Scholar
Digital Library
- Guangjian Zheng, Min Tan, Jun Yu, Qing Wu, and Jianping Fan. 2017. Fine-grained image recognition via weakly supervised click data guided bilinear CNN model. In IEEE International Conference on Multimedia and Expo. 661--666.Google Scholar
Cross Ref
Index Terms
User-Click-Data-Based Fine-Grained Image Recognition via Weakly Supervised Metric Learning
Recommendations
Robust object recognition via weakly supervised metric and template learning
In this paper, we present a new framework for object recognition via weakly supervised metric and template learning, wherein the optimal metric and templates are jointly learned. Its advantages include high computational speed, and robustness against ...
Discriminative information restoration and extraction for weakly supervised low-resolution fine-grained image recognition
Highlights- To the best of our knowledge, we are the first to address the issue of weakly supervised low-resolution fine-grained image recognition in an end-to-end ...
AbstractThe existing methods of fine-grained image recognition mainly devote to learning subtle yet discriminative features from the high-resolution input. However, their performance deteriorates significantly when they are used for low ...
Semantic Segmentation based on Stacked Discriminative Autoencoders and Context-Constrained Weakly Supervised Learning
MM '15: Proceedings of the 23rd ACM international conference on MultimediaIn this paper, we focus on tacking the problem of weakly supervised semantic segmentation. The aim is to predict the class label of image regions under weakly supervised settings, where training images are only provided with image-level labels ...






Comments