Abstract
Most recent large-scale image search approaches build on a bag-of-visual-words model, in which local features are quantized and then efficiently matched between images. However, the limited discriminability of local features and the BOW quantization errors cause a lot of mismatches between images, which limit search accuracy. To improve the accuracy, geometric verification is popularly adopted to identify geometrically consistent local matches for image search, but it is hard to directly use these matches to distinguish partial-duplicate images from non-partial-duplicate images. To address this issue, instead of simply identifying geometrically consistent matches, we propose a region-level visual consistency verification scheme to confirm whether there are visually consistent region (VCR) pairs between images for partial-duplicate search. Specifically, after the local feature matching, the potential VCRs are constructed via mapping the regions segmented from candidate images to a query image by utilizing the properties of the matched local features. Then, the compact gradient descriptor and convolutional neural network descriptor are extracted and matched between the potential VCRs to verify their visual consistency to determine whether they are VCRs. Moreover, two fast pruning algorithms are proposed to further improve efficiency. Extensive experiments demonstrate the proposed approach achieves higher accuracy than the state of the art and provide comparable efficiency for large-scale partial-duplicate search tasks.
- W. Zhou, H. Li, Y. Lu, and Q. Tian. 2013. SIFT match verification by geometric coding for large-scale partial-duplicate web image search. ACM Trans. Multimedia Comput. Commun. 9, 1 (2013), 1--18.Google Scholar
Digital Library
- Z. Wu, Q. Ke, M. Isard, and J. Sun. 2009. Bundling features for large scale partial-duplicate web image search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 25--32.Google Scholar
- X. Qian, Y. Xue, X. Yang, Y. Y. Tang, X. Hou, and T. Mei. 2015. Landmark summarization with diverse viewpoints. IEEE Trans. Circuits Syst. Video Technol. 25, 11 (2015), 1857--1869.Google Scholar
Cross Ref
- Z. Zhou, Y. Wang, Q. M. J. Wu, C. N. Yang, X. M. Sun. and 2017. Effective and efficient global context verification for image copy detection. IEEE Trans. Inf. Forensics Secur. 12, 1 (2017), 48--63.Google Scholar
Digital Library
- Z. Zhou, C. Yang, and B. Chen. 2016. Effective and efficient image copy detection with resistance to arbitrary rotation. IEICE Trans. Inf. Syst. E99.D, 6 (2016), 1531--1540.Google Scholar
Cross Ref
- L. Zheng and C. Song. 2018. Fast near-duplicate image detection in Riemannian space by a novel hashing scheme. CMC-Comput. Mater. Con. 56, 3 (2018), 529--539.Google Scholar
- Z. Zhou, Y. Cao, M. Wang, E. Fan, and Q. M. J. Wu. 2019. Faster-RCNN based robust coverless information hiding system in cloud environment. IEEE Access 7 (2019), 179891--179897.Google Scholar
Cross Ref
- Z. Zhou, Y. Mu, and Q. M. J. Wu. 2019. Coverless image steganography using partial-duplicate image retrieval. Soft. Comput. 23, 13 (2019), 4927--4938.Google Scholar
Digital Library
- W. Zhao, X. Wu, and C. Ngo. 2010. On the annotation of web videos by efficient near-duplicate search. IEEE Trans. Multimedia 12, 5 (2010), 448--461.Google Scholar
Digital Library
- J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision. 1470--1477.Google Scholar
Digital Library
- D. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110.Google Scholar
Digital Library
- K. Yan and R. Sukthankar. 2004. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 506--513.Google Scholar
- H. Bay, A. Ess, T. Tuytelaars, and L. Gool. 2008. Speeded-up robust features (SURF). Comput. Vis. Image Understand. 110, 3 (2008), 346--59.Google Scholar
Digital Library
- P. Fischer, A. Dosovitskiy, and T. Brox. 2014. Descriptor matching with convolutional neural networks: A comparison to SIFT. arXiv:1405.5769.Google Scholar
- E. Mohedano, K. Mcguinness, N. E. O'Connor, A. Salvador, F. Marques, and X. Giro-I-Nieto. 2016. Bags of local convolutional features for scalable instance search. In Proceedings of the 2016 ACM International Conference on Multimedia Retrieval. 327--331.Google Scholar
- L. Chu, S. Jiang, S. Wang, Y. Zhang, and Q. Huang. 2013. Robust spatial consistency graph model for partial duplicate image retrieval. IEEE Trans. Multimedia 15, 8 (2013), 1982--1996.Google Scholar
Digital Library
- Shuang Wang and Shuqiang Jiang. 2015. INSTRE: A new benchmark for instance-level object retrieval and recognition. ACM Trans. Multimedia Comput. Commun. 11, 3 (2015), 1--21.Google Scholar
Digital Library
- H. Jegou, M. Douze, and C. Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the 10th European Conference on Computer Vision: Part I. 304--317.Google Scholar
- J. Philbin, O. Chum, and O. Isard. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
- Q. Chum, J. Matas, and S. Obdrzalek. 2004. Enhancing RANSAC by generalized model optimization. In Proceedings of the Asian Conference on Computer Vision. 812--817.Google Scholar
- X. Li, M. Larson, and A. Hanjalic. 2015. Pairwise geometric matching for large-scale object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5153--5161.Google Scholar
- A. Babenko and V. Lempitsky. 2015. Aggregating deep convolutional features for image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 1269--1277.Google Scholar
- Y. Kalantidis, C. Mellina, and S. Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In Proceedings of the European Conference on Computer Vision. 685--701.Google Scholar
- F. Perronnin, Y. Liu, J. Sánchez, and H. Poirier. 2010. Large-scale image retrieval with compressed Fisher vectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3384--3391.Google Scholar
- H. Jégou, M. Douze, C. Schmid, and P. Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3304--3311.Google Scholar
- E. Spyromitros-Xioufis, S. Papadopoulos, I. Y. Kompatsiaris, G. Tsoumakas, and I. Vlahavas. 2014. A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE Trans. Multimedia 16, 6 (2014), 1713--1728.Google Scholar
Cross Ref
- L. Zheng, S. Wang, J. Wang, and Q. Tian. 2016. Accurate image search with multi-scale contextual evidences. Int. J. Comput. Vis. 120, 1 (2016), 1--13.Google Scholar
Digital Library
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems. 1097--1105.Google Scholar
- Y. Yang, Q. M. J. Wu, X. Feng, and A. Thangarajah. 2019. Recomputation of dense layers for the performance improvement of DCNN. IEEE Trans. Pattern Anal. Mach. Intell. Early access. May 20, 2019. DOI:10.1109/TPAMI.2019.2917685Google Scholar
Cross Ref
- D. Nister and H. Stewenius. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2161--2168.Google Scholar
- H. Jegou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1 (2011), 117--28.Google Scholar
Digital Library
- S. Zhang, Q. Huang, and G. Hua. 2010. Building contextual visual vocabulary for large-scale image applications. In Proceedings of the International Conference on Multimedia. 501--510.Google Scholar
- S. Zhang, Q. Huang, and Y. Lu. 2010. Building pair-wise visual word tree for efficent image re-ranking. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 794--797.Google Scholar
- S. Zhang, Q. Tian, and Q. Huang. 2014. USB: Ultrashort binary descriptor for fast visual matching and retrieval. IEEE Trans. Image Process. 23, 8 (2014), 3671--3683.Google Scholar
Cross Ref
- W. Zhou, H. Li, and R. Hong. 2015. BSIFT: Toward data-independent codebook for large scale image search. IEEE Trans. Image Process. 24, 3 (2015), 967--979.Google Scholar
Cross Ref
- W. Zhou, M. Yang, and X. Wang. 2016. Scalable feature matching by dual cascaded scalar quantization for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1 (2016), 159--171.Google Scholar
Digital Library
- W. Zhou, M. Yang, and H. Li. 2014. Towards codebook-free: Scalable cascaded hashing for mobile image search. IEEE Trans. Multimedia 16, 3 (2014), 601--611.Google Scholar
Digital Library
- Z. Liu, H. Li, W. Zhou, and Q. Tian. 2012. Embedding spatial context information into inverted file for large-scale image retrieval. In Proceedings of the ACM International Conference on Multimedia. 199--208.Google Scholar
- J. Yao, B. Yang, and Q. Zhu. 2015. Near-duplicate image retrieval based on contextual descriptor. IEEE Signal Process Lett. 22, 9 (2015), 1404--1408.Google Scholar
Cross Ref
- Z. Zhou, Q. Wu, and X. Sun. 2018. Encoding multiple contextual clues for partial-duplicate image retrieval. Pattern Recognit. Lett. 109 (2018), 18--26.Google Scholar
Cross Ref
- K. Mikolajczyk and C. Schmid. 2004. Scale affine invariant interest point detectors. Int. J. Comput. Vis. 60, 1 (2004), 63--86.Google Scholar
Digital Library
- J. Matas, O. Chum, M. Urban, and T. Pajdla. 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image Vision Comput. 22, 10 (2004), 761--767.Google Scholar
Cross Ref
- M. Fischle and R. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (1981), 381--95.Google Scholar
Digital Library
- P. Felzenszwalb and H. Huttenlocher. 2004. Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 2 (2004), 167--181.Google Scholar
Digital Library
- C. Liu, J. Yuen, and A. Torralba. 2011. Nonparametric scene parsing via label transfer. IEEE Trans. Pattern Anal. Mach. Intell. 33, 12 (2011), 2368--2382.Google Scholar
Digital Library
- R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 11 (2012), 2274--2281.Google Scholar
Digital Library
- J. Tighe and S. Lazebnik. 2013. Superparsing: Scalable nonparametric image parsing with superpixels. Int. J. Comput. Vis. 101, 2 (2013), 329--349.Google Scholar
Digital Library
- N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 886--893.Google Scholar
- R. Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1440--1448.Google Scholar
- University of San Antonio. 2011. DupImage. Available at http://www.cs.utsa.edu/∼wzhou/data/DupGroundTruthDataset.tgz.Google Scholar
- Oxford. 2007. Oxford5K. Available at http://www.robots.ox.ac.uk/∼vgg/data/oxbuildings.Google Scholar
- University of Kentucky. 2006. UKBench. Available at http://www.vis.uky.edu/∼stewe/ukbench.Google Scholar
- 2008. FLIKR Images. Available at http://press.liacs.nl/mirflickr/mirdownload.html.Google Scholar
Index Terms
Region-Level Visual Consistency Verification for Large-Scale Partial-Duplicate Image Search
Recommendations
SIFT match verification by geometric coding for large-scale partial-duplicate web image search
Most large-scale image retrieval systems are based on the bag-of-visual-words model. However, the traditional bag-of-visual-words model does not capture the geometric context among local features in images well, which plays an important role in image ...
Region-Based Near-Duplicate Image Retrieval with Geometric Consistency
ACPR '13: Proceedings of the 2013 2nd IAPR Asian Conference on Pattern RecognitionState-of-the-art near-duplicate image retrieval systems take the image as a whole by the bag-of-words (BOW) representation. Feature quantization on large image database always reduces the discriminative power of image features, and the global BOW ...
Multi-order visual phrase for scalable partial-duplicate visual search
Visual phrase considers multiple visual words and captures extra spatial clues among them. Thus, visual phrase shows better discriminative power than single visual word in image retrieval and matching. Not withstanding their success, existing visual ...






Comments