skip to main content
research-article

User-Click-Data-Based Fine-Grained Image Recognition via Weakly Supervised Metric Learning

Published:24 July 2018Publication History
Skip Abstract Section

Abstract

We present a novel fine-grained image recognition framework using user click data, which can bridge the semantic gap in distinguishing categories that are similar in visual. As query set in click data is usually large-scale and redundant, we first propose a click-feature-based query-merging approach to merge queries with similar semantics and construct a compact click feature. Afterward, we utilize this compact click feature and convolutional neural network (CNN)-based deep visual feature to jointly represent an image. Finally, with the combined feature, we employ the metriclearning-based template-matching scheme for efficient recognition. Considering the heavy noise in the training data, we introduce a reliability variable to characterize the image reliability, and propose a weakly-supervised metric and template leaning with smooth assumption and click prior (WMTLSC) method to jointly learn the distance metric, object templates, and image reliability. Extensive experiments are conducted on a public Clickture-Dog dataset and our newly established Clickture-Bird dataset. It is shown that the click-data-based query merging helps generating a highly compact (the dimension is reduced to 0.9%) and dense click feature for images, which greatly improves the computational efficiency. Also, introducing this click feature into CNN feature further boosts the recognition accuracy. The proposed framework performs much better than previous state-of-the-arts in fine-grained recognition tasks.

References

  1. Yalong Bai, Kuiyuan Yang, Wei Yu, Chang Xu, Wei-Ying Ma, and Tiejun Zhao. 2015. Automatic image dataset construction from click-through logs using deep neural network. In ACM Conference on Multimedia Conference. ACM, 441--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. Berg, Jiongxin Liu, Seung Woo Lee, M. L. Alexander, D. W. Jacobs, and P. N. Belhumeur. 2014. Birdsnap: Large-scale fine-grained visual categorization of birds. In IEEE Conference on Computer Vision and Pattern Recognition. 2019--2026. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Zhang, X. Shang, H. Luan, et al. 2016. Learning from collective intelligence: Feature learning using social images and tags. ACM Transactions on Multimedia Computing Communications and Applications 13, 1 (2016), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rudi L. Cilibrasi and Paul Vitanyi. 2007. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19, 3 (2007), 370--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Wu Feng and Dong Liu. 2017. Fine-Grained Image Recognition from Click-Through Logs Using Deep Siamese Network. Springer International Publishing.Google ScholarGoogle Scholar
  6. Shenghua Gao, Ivor Wai-Hung Tsang, and Yi Ma. 2014. Learning category-specific dictionary and shared dictionary for fine-grained image categorization. IEEE Transactions on Image Processing 23, 2 (Feb. 2014), 623--634. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Gavves, T. Mensink, T. Tommasi, C. G. M. Snoek, and T. Tuytelaars. 2015. Active transfer learning with zero-shot priors: Reusing past datasets for future tasks. In IEEE International Conference on Computer Vision. 2731--2739. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xian-Sheng Hua, Linjun Yang, Jingdong Wang, Jing Wang, Ming Ye, Kuansan Wang, Yong Rui, and Jin Li. 2013. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In ACM International Conference on Multimedia. ACM, 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Junqi Jin, Kun Fu, and Changshui Zhang. 2014. Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Transactions on Intelligent Transportation Systems 15, 5 (2014), 1991--2000.Google ScholarGoogle ScholarCross RefCross Ref
  10. Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. 2011. Novel dataset for fine-grained image categorization. In 1st Workshop on IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  11. Zhenzhong Kuang, Jun Yu, Zongmin Li, Baopeng Zhang, and Jianping Fan. 2018. Integrating multi-level deep learning and concept ontology for large-scale visual recognition. Pattern Recognition 78 (2018), 198--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chenghua Li, Qiang Song, Yuhang Wang, Hang Song, Qi Kang, Jian Cheng, and Hanqing Lu. 2016. Learning to recognition from bing clickture data. In IEEE International Conference on Multimedia and Expo. 1--4.Google ScholarGoogle Scholar
  13. Brian McFee and Gert R. Lanckriet. 2010. Metric learning to rank. In International Conference on Machine Learning. 775--782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Meng, R. Huang, and J. Gu. 2013. A review of semantic similarity measures in WordNet. International Journal of Hybrid Information Technology 6 (2013).Google ScholarGoogle Scholar
  15. Qi Qian, Rong Jin, Shenghuo Zhu, and Yuanqing Lin. 2015. Fine-grained visual categorization via multi-stage metric learning. In IEEE Conference on Computer Vision and Pattern Recognition. 3716--3724.Google ScholarGoogle ScholarCross RefCross Ref
  16. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ioannis Sarafis, Christos Diou, and Anastasios Delopoulos. 2015. Building effective SVM concept detectors from clickthrough data for large-scale image retrieval. International Journal of Multimedia Information Retrieval 4, 2 (2015), 129--142.Google ScholarGoogle ScholarCross RefCross Ref
  18. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Computer Science (2014).Google ScholarGoogle Scholar
  19. Qiang Song, Sixie Yu, Cong Leng, JiaXiang Wu, Qinghao Hu, and Jian Cheng. 2015. Learning deep features for MSR-bing information retrieval challenge. In ACM Conference on Multimedia Conference. ACM, 169--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Min Tan, Zhenfang Hu, Baoyuan Wang, Jieyi Zhao, and Yueming Wang. 2016. Robust object recognition via weakly supervised metric and template learning. Neurocomputing 101 (2016), 96--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Min Tan, Gang Pan, Yueming Wang, Yuting Zhang, and Zhaohui Wu. 2014. L1-norm latent SVM for compact features in object detection. Neurocomputing 139, 0 (2014), 56--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Tan, B. Wang, Z. Wu, J. Wang, and G. Pan. 2016. Weakly supervised metric learning for traffic sign recognition in a LIDAR-equipped vehicle. IEEE Transactions on Intelligent Transportation Systems 17, 5 (2016), 1415--1427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Tan, Y. Wang, and G. Pan. 2012. Feature reduction for efficient object detection via L1-norm latent SVM. In Intelligent Science and Intelligent Data Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Min Tan, Jun Yu, Qingming Huang, and Weichen Wu. 2018. Click data guided query modeling with click propagation and sparse coding. Multimedia Tools and Applications 3 (2018), 1--14.Google ScholarGoogle Scholar
  25. Min Tan, Jun Yu, Guangjian Zheng, Weichen Wu, and Kejia Sun. 2016. Deep neural network boosted large scale image recognition using user click data. In International Conference on Internet Multimedia Computing and Service. 118--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jinhui Tang, Xiangbo Shu, Zechao Li, Guo Jun Qi, and Jingdong Wang. 2016. Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Transactions on Multimedia Computing Communications and Applications 12, 4s (2016), 68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Aruni RoyChowdhury Tsung-Yu Lin and Subhransu Maji. 2015. Bilinear CNN models for fine-grained visual recognition. In IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Vedaldi, S. Mahendran, S. Tsogkas, S. Maji, R. Girshick, J. Kannala, E. Rahtu, I. Kokkinos, M. B. Blaschko, D. Weiss, B. Taskar, K. Simonyan, N. Saphra, and S. Mohamed. 2014. Understanding objects in detail with fine-grained attributes. In IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The Caltech-UCSD Birds200-2011 dataset. California Institute of Technology.Google ScholarGoogle Scholar
  30. Kilian Q. Weinberger and Lawrence K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10 (2009), 207--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Guotian Xie, Kuiyuan Yang, Yalong Bai, Min Shang, Yong Rui, and Jianhuang Lai. 2016. Improve dog recognition by mining more information from both click-through logs and pre-trained models. In IEEE International Conference on Multimedia and Expo Workshops. 1--4.Google ScholarGoogle Scholar
  32. X. Yang, T. Mei, Y. Zhang, J. Liu, and S. Satoh. 2016. Web image search re-ranking with click-based similarity and typicality. IEEE Transactions on Image Processing 25, 10 (2016), 4617--4630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xiaoshan Yang, Tianzhu Zhang, and Changsheng Xu. 2016. Semantic feature mining for video event understanding. ACM Transactions on Multimedia Computing Communications and Applications 12, 4 (2016), 55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, and Ming Hsuan Yang. 2015. Boosted multifeature learning for cross-domain transfer. ACM Transactions on Multimedia Computing Communications and Applications 11, 3 (2015), 35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Xiaopeng Yang, Yongdong Zhang, Ting Yao, Chong-Wah Ngo, and Tao Mei. 2015. Click-boosting multi-modality graph-based reranking for image search. Multimedia Systems 21, 2 (2015), 217--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yifang Yin, Zhijie Shen, Luming Zhang, and Roger Zimmermann. 2015. Spatial-temporal tag mining for automatic geospatial video annotation. ACM Transactions on Multimedia Computing Communications and Applications 11, 2 (2015), 1--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jun Yu, Yong Rui, and Bo Chen. 2013. Exploiting click constraints and multi-view features for image re-ranking. IEEE Transactions on Multimedia 16, 1 (2013), 159--168.Google ScholarGoogle ScholarCross RefCross Ref
  38. Jun Yu, Yong Rui, and Dacheng Tao. 2014. Click prediction for web image reranking using multimodal sparse coding. IEEE Transactions on Image Processing 23, 5 (2014), 2019--2032.Google ScholarGoogle ScholarCross RefCross Ref
  39. J. Yu, D. Tao, M. Wang, and Y. Rui. 2015. Learning to rank using user clicks and visual features for image retrieval. IEEE Transactions on Cybernetics 45, 4 (April 2015), 767--779.Google ScholarGoogle ScholarCross RefCross Ref
  40. Jun Yu, Xiaokang Yang, Fei Gao, and Dacheng Tao. 2016. Deep multimodal distance metric learning using click constraints for image ranking. IEEE Transactions on Cybernetics (2016).Google ScholarGoogle Scholar
  41. Hanwang Zhang, Zheng Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao, and Tat Seng Chua. 2014. Attribute-augmented semantic hierarchy: Towards a unified framework for content-based image retrieval. ACM Transactions on Multimedia Computing Communications and Applications 11, 1s (2014), 1--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Luming Zhang, Xuelong Li, Liqiang Nie, Yi Yang, and Yingjie Xia. 2016. Weakly supervised human fixations prediction. IEEE Transactions on Cybernetics 46, 1 (2016), 258.Google ScholarGoogle ScholarCross RefCross Ref
  43. Ning Zhang, Ryan Farrell, Forrest Iandola, and Trevor Darrell. 2013. Deformable part descriptors for fine-grained recognition and attribute prediction. In IEEE International Conference on Computer Vision. 729--736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ning Zhang, Manohar Paluri, Marc’Aurelio Ranzato, Trevor Darrell, and Lubomir Bourdev. 2014. PANDA: Pose aligned networks for deep attribute modeling. In IEEE Computer Vision and Pattern Recognition. 1637--1644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Guangjian Zheng, Min Tan, Jun Yu, Qing Wu, and Jianping Fan. 2017. Fine-grained image recognition via weakly supervised click data guided bilinear CNN model. In IEEE International Conference on Multimedia and Expo. 661--666.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. User-Click-Data-Based Fine-Grained Image Recognition via Weakly Supervised Metric Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!