skip to main content
research-article

INSTRE: A New Benchmark for Instance-Level Object Retrieval and Recognition

Authors Info & Claims
Published:05 February 2015Publication History
Skip Abstract Section

Abstract

Over the last several decades, researches on visual object retrieval and recognition have achieved fast and remarkable success. However, while the category-level tasks prevail in the community, the instance-level tasks (especially recognition) have not yet received adequate focuses. Applications such as content-based search engine and robot vision systems have alerted the awareness to bring instance-level tasks into a more realistic and challenging scenario. Motivated by the limited scope of existing instance-level datasets, in this article we propose a new benchmark for INSTance-level visual object REtrieval and REcognition (INSTRE). Compared with existing datasets, INSTRE has the following major properties: (1) balanced data scale, (2) more diverse intraclass instance variations, (3) cluttered and less contextual backgrounds, (4) object localization annotation for each image, (5) well-manipulated double-labelled images for measuring multiple object (within one image) case. We will quantify and visualize the merits of INSTRE data, and extensively compare them against existing datasets. Then on INSTRE, we comprehensively evaluate several popular algorithms to large-scale object retrieval problem with multiple evaluation metrics. Experimental results show that all the methods suffer a performance drop on INSTRE, proving that this field still remains a challenging problem. Finally we integrate these algorithms into a simple yet efficient scheme for recognition and compare it with classification-based methods. Importantly, we introduce the realistic multiobjects recognition problem. All experiments are conducted in both single object case and multiple objects case.

References

  1. P. F. Alcantarilla, J. Nuevo, and A. Bartoli. 2013. Fast explicit diffusion for accelerated features in nonlinear scale spaces. In Proceedings of the British Machine Vision Conference.Google ScholarGoogle Scholar
  2. R. Arandjelović and A. Zisserman. 2011. Smooth object retrieval using a bag of boundaries. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Avrithis, G. Tolias, and Y. Kalantidis. 2010. Feature map hashing: Sub-linear indexing of appearance and global geometry. In Proceedings of the ACM Multimedia Conference. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Bay, T. Tuytelaars, and L. Van Gool. 2006. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision. Springer, 404--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2010. Kernel descriptors for visual recognition. In Neural Information Processing Systems 1, 3.Google ScholarGoogle Scholar
  6. L. Bo and C. Sminchisescu. 2009. Efficient match kernel between sets of features for visual recognition. In Neural Information Processing Systems, 1730--1731.Google ScholarGoogle Scholar
  7. L. Chu, S. Jiang, S. Wang, Y. Zhang, and Q. Huang. 2013. Robust spatial consistency graph model for partial duplicate image retrieval. IEEE Trans. Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. Chum, A. Mikulik, M. Perdoch, and J. Matas. 2011. Total recall II: Query expansion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 889--896. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Deng, A. C Berg, K. Li, and F.-F. Li. 2010. What does classifying more than 10,000 image categories tell us? In Proceedings of the European Conference on Computer Vision. Springer, 71--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. A Fischler and R. C Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Gao and D. Koller. 2011. Discriminative learning of relaxed hierarchy for large-scale visual recognition. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2072--2079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J.-M. Geusebroek, G. J Burghouts, and A. W. M. Smeulders. 2005. The Amsterdam library of object images. Int. J. Comput. Vision 61, 1, 103--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Jain, H. Jégou, and P. Gros. 2011. Asymmetric hamming embedding: taking the best of our bits for large scale image search. In Proceedings of the ACM Multimedia Conference. ACM, 1441--1444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. V. Jawahar, A. Zisserman, A. Vedaldi, and O. M. Parkhi. 2012. Cats and dogs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Jégou, M. Douze, and C. Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the European Conference on Computer Vision. Springer, 304--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Jégou, M. Douze, and C. Schmid. 2009. On the burstiness of visual elements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1169--1176.Google ScholarGoogle Scholar
  18. Y. Jiang, J. Meng, and J. Yuan. 2012. Randomized visual phrases for object search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3100--3107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Joly and O. Buisson. 2009. Logo retrieval with a contrario visual query expansion. In Proceedings of the ACM Multimedia Conference. ACM, 581--584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Kalantidis, L. G. Pueyo, M. Trevisiol, R. van Zwol, and Y. Avrithis. 2011. Scalable triangulation-based logo recognition. In Proceedings of the ACM International Conference on Multimedia Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. A. Nene, S. K. Nayar, and H. Murase. 1996. Columbia object image library (COIL-20). Tech. Rep. CUCS-005-96.Google ScholarGoogle Scholar
  23. D. Nistér and H. Stewénius. 2006. Scalable Recognition with a Vocabulary Tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 2161--2168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Ordonez, J. Deng, Y. Choi, A. C. Berg, and T. L. Berg. 2013. From large scale image categorization to entry-level categories. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Perronnin, J. Sánchez, and T. Mensink. 2010. Improving the Fisher kernel for large-scale image classification. In Proceedings of the European Conference on Computer Vision. Springer, 143--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  27. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  28. N. Pinto, D. D. Cox, and J. J. DiCarlo. 2008. Why is real-world visual object recognition hard? PLoS Computa. Biol. 4, 1, e27.Google ScholarGoogle Scholar
  29. J. Ponce, T. L. Berg, M. Everingham, et al. 2006. Dataset Issues in object recognition. In Toward Category-Level Object Recognition, Springer, 29--48.Google ScholarGoogle Scholar
  30. S. Romberg, L. G. Pueyo, R. Lienhart, and R. van Zwol. 2011. Scalable logo recognition in real-world images. In Proceedings of the ACM International Conference on Multimedia Retrieval. ACM, 25:1--25:8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. 2011. ORB: an efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2564--2571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. 2008. LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vision 77, 1--3, 157--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Shen, Z. Lin, J. Brandt, S. Avidan, and Y. Wu. 2012. Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3013--3020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 1470--1477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Torralba and A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1521--1528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, and J.-M. Geusebroek. 2010. Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32, 7, 1271--1283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Vedaldi and B. Fulkerson. 2008. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.Google ScholarGoogle Scholar
  38. S. Wang, Y. Xue, L. Chu, Y. Jiang, and S. Jiang. 2013. ObjectSense: A scalable multi-objects recognition system based on partial-duplicate image retrieval. In Proceedings of the ACM International Conference on Multimedia Retrieval. ACM, 317--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Z. Wu, Q. Xu, S. Jiang, Q. Huang, P. Cui, and L. Li. 2010. Adding affine invariant geometric constraint for partial-duplicate image retrieval. In Proceedings of the International Conference on Pattern Recognition. IEEE, 842--845. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. B. Yao, Ad. Khosla, and Li F.-F. 2011. Combining randomization and discrimination for fine-grained image categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1577--1584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Zhou, H. Li, Y. Lu, and Q. Tian. 2013. SIFT match verification by geometric coding for large-scale partial-duplicate web image search. ACM Trans. Multimedia Comput. Commun. Appl. 9, 1, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Zhou, Y. Lu, H. Li, Y. Song, and Q. Tian. 2010. Spatial coding for large scale partial-duplicate web image search. In Proceedings of the ACM Multimedia Conference. ACM, 511--520. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. INSTRE: A New Benchmark for Instance-Level Object Retrieval and Recognition

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!