Abstract
Over the last several decades, researches on visual object retrieval and recognition have achieved fast and remarkable success. However, while the category-level tasks prevail in the community, the instance-level tasks (especially recognition) have not yet received adequate focuses. Applications such as content-based search engine and robot vision systems have alerted the awareness to bring instance-level tasks into a more realistic and challenging scenario. Motivated by the limited scope of existing instance-level datasets, in this article we propose a new benchmark for INSTance-level visual object REtrieval and REcognition (INSTRE). Compared with existing datasets, INSTRE has the following major properties: (1) balanced data scale, (2) more diverse intraclass instance variations, (3) cluttered and less contextual backgrounds, (4) object localization annotation for each image, (5) well-manipulated double-labelled images for measuring multiple object (within one image) case. We will quantify and visualize the merits of INSTRE data, and extensively compare them against existing datasets. Then on INSTRE, we comprehensively evaluate several popular algorithms to large-scale object retrieval problem with multiple evaluation metrics. Experimental results show that all the methods suffer a performance drop on INSTRE, proving that this field still remains a challenging problem. Finally we integrate these algorithms into a simple yet efficient scheme for recognition and compare it with classification-based methods. Importantly, we introduce the realistic multiobjects recognition problem. All experiments are conducted in both single object case and multiple objects case.
- P. F. Alcantarilla, J. Nuevo, and A. Bartoli. 2013. Fast explicit diffusion for accelerated features in nonlinear scale spaces. In Proceedings of the British Machine Vision Conference.Google Scholar
- R. Arandjelović and A. Zisserman. 2011. Smooth object retrieval using a bag of boundaries. In Proceedings of the IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- Y. Avrithis, G. Tolias, and Y. Kalantidis. 2010. Feature map hashing: Sub-linear indexing of appearance and global geometry. In Proceedings of the ACM Multimedia Conference. ACM. Google Scholar
Digital Library
- H. Bay, T. Tuytelaars, and L. Van Gool. 2006. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision. Springer, 404--417. Google Scholar
Digital Library
- Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2010. Kernel descriptors for visual recognition. In Neural Information Processing Systems 1, 3.Google Scholar
- L. Bo and C. Sminchisescu. 2009. Efficient match kernel between sets of features for visual recognition. In Neural Information Processing Systems, 1730--1731.Google Scholar
- L. Chu, S. Jiang, S. Wang, Y. Zhang, and Q. Huang. 2013. Robust spatial consistency graph model for partial duplicate image retrieval. IEEE Trans. Multimedia. Google Scholar
Digital Library
- O. Chum, A. Mikulik, M. Perdoch, and J. Matas. 2011. Total recall II: Query expansion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 889--896. Google Scholar
Digital Library
- J. Deng, A. C Berg, K. Li, and F.-F. Li. 2010. What does classifying more than 10,000 image categories tell us? In Proceedings of the European Conference on Computer Vision. Springer, 71--84. Google Scholar
Digital Library
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248--255.Google Scholar
Cross Ref
- M. A Fischler and R. C Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381--395. Google Scholar
Digital Library
- T. Gao and D. Koller. 2011. Discriminative learning of relaxed hierarchy for large-scale visual recognition. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2072--2079. Google Scholar
Digital Library
- J.-M. Geusebroek, G. J Burghouts, and A. W. M. Smeulders. 2005. The Amsterdam library of object images. Int. J. Comput. Vision 61, 1, 103--112. Google Scholar
Digital Library
- M. Jain, H. Jégou, and P. Gros. 2011. Asymmetric hamming embedding: taking the best of our bits for large scale image search. In Proceedings of the ACM Multimedia Conference. ACM, 1441--1444. Google Scholar
Digital Library
- C. V. Jawahar, A. Zisserman, A. Vedaldi, and O. M. Parkhi. 2012. Cats and dogs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- H. Jégou, M. Douze, and C. Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the European Conference on Computer Vision. Springer, 304--317. Google Scholar
Digital Library
- H. Jégou, M. Douze, and C. Schmid. 2009. On the burstiness of visual elements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1169--1176.Google Scholar
- Y. Jiang, J. Meng, and J. Yuan. 2012. Randomized visual phrases for object search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3100--3107. Google Scholar
Digital Library
- A. Joly and O. Buisson. 2009. Logo retrieval with a contrario visual query expansion. In Proceedings of the ACM Multimedia Conference. ACM, 581--584. Google Scholar
Digital Library
- Y. Kalantidis, L. G. Pueyo, M. Trevisiol, R. van Zwol, and Y. Avrithis. 2011. Scalable triangulation-based logo recognition. In Proceedings of the ACM International Conference on Multimedia Retrieval. Google Scholar
Digital Library
- D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2, 91--110. Google Scholar
Digital Library
- S. A. Nene, S. K. Nayar, and H. Murase. 1996. Columbia object image library (COIL-20). Tech. Rep. CUCS-005-96.Google Scholar
- D. Nistér and H. Stewénius. 2006. Scalable Recognition with a Vocabulary Tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 2161--2168. Google Scholar
Digital Library
- V. Ordonez, J. Deng, Y. Choi, A. C. Berg, and T. L. Berg. 2013. From large scale image categorization to entry-level categories. In Proceedings of the IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- F. Perronnin, J. Sánchez, and T. Mensink. 2010. Improving the Fisher kernel for large-scale image classification. In Proceedings of the European Conference on Computer Vision. Springer, 143--156. Google Scholar
Digital Library
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- N. Pinto, D. D. Cox, and J. J. DiCarlo. 2008. Why is real-world visual object recognition hard? PLoS Computa. Biol. 4, 1, e27.Google Scholar
- J. Ponce, T. L. Berg, M. Everingham, et al. 2006. Dataset Issues in object recognition. In Toward Category-Level Object Recognition, Springer, 29--48.Google Scholar
- S. Romberg, L. G. Pueyo, R. Lienhart, and R. van Zwol. 2011. Scalable logo recognition in real-world images. In Proceedings of the ACM International Conference on Multimedia Retrieval. ACM, 25:1--25:8. Google Scholar
Digital Library
- E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. 2011. ORB: an efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2564--2571. Google Scholar
Digital Library
- B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. 2008. LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vision 77, 1--3, 157--173. Google Scholar
Digital Library
- X. Shen, Z. Lin, J. Brandt, S. Avidan, and Y. Wu. 2012. Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3013--3020. Google Scholar
Digital Library
- J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 1470--1477. Google Scholar
Digital Library
- A. Torralba and A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1521--1528. Google Scholar
Digital Library
- J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, and J.-M. Geusebroek. 2010. Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32, 7, 1271--1283. Google Scholar
Digital Library
- A. Vedaldi and B. Fulkerson. 2008. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.Google Scholar
- S. Wang, Y. Xue, L. Chu, Y. Jiang, and S. Jiang. 2013. ObjectSense: A scalable multi-objects recognition system based on partial-duplicate image retrieval. In Proceedings of the ACM International Conference on Multimedia Retrieval. ACM, 317--318. Google Scholar
Digital Library
- Z. Wu, Q. Xu, S. Jiang, Q. Huang, P. Cui, and L. Li. 2010. Adding affine invariant geometric constraint for partial-duplicate image retrieval. In Proceedings of the International Conference on Pattern Recognition. IEEE, 842--845. Google Scholar
Digital Library
- B. Yao, Ad. Khosla, and Li F.-F. 2011. Combining randomization and discrimination for fine-grained image categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1577--1584. Google Scholar
Digital Library
- W. Zhou, H. Li, Y. Lu, and Q. Tian. 2013. SIFT match verification by geometric coding for large-scale partial-duplicate web image search. ACM Trans. Multimedia Comput. Commun. Appl. 9, 1, 4. Google Scholar
Digital Library
- W. Zhou, Y. Lu, H. Li, Y. Song, and Q. Tian. 2010. Spatial coding for large scale partial-duplicate web image search. In Proceedings of the ACM Multimedia Conference. ACM, 511--520. Google Scholar
Digital Library
Index Terms
INSTRE: A New Benchmark for Instance-Level Object Retrieval and Recognition
Recommendations
Metric learning based object recognition and retrieval
Object recognition and retrieval is an important topic in intelligent robotics and pattern recognition, where an effective recognition engine plays an important role. To achieve a good performance, we propose a metric learning based object recognition ...
From Aardvark to Zorro: A Benchmark for Mammal Image Classification
Current object recognition systems aim at recognizing numerous object classes under limited supervision conditions. This paper provides a benchmark for evaluating progress on this fundamental task. Several methods have recently proposed to utilize the ...
LeSSS: learned shared semantic spaces for relating multi-modal representations of 3D shapes
SGP '15: Proceedings of the Eurographics Symposium on Geometry ProcessingIn this paper, we propose a new method for structuring multi-modal representations of shapes according to semantic relations. We learn a metric that links semantically similar objects represented in different modalities. First, 3D-shapes are associated ...






Comments