skip to main content
research-article

Region-Level Visual Consistency Verification for Large-Scale Partial-Duplicate Image Search

Authors Info & Claims
Published:22 May 2020Publication History
Skip Abstract Section

Abstract

Most recent large-scale image search approaches build on a bag-of-visual-words model, in which local features are quantized and then efficiently matched between images. However, the limited discriminability of local features and the BOW quantization errors cause a lot of mismatches between images, which limit search accuracy. To improve the accuracy, geometric verification is popularly adopted to identify geometrically consistent local matches for image search, but it is hard to directly use these matches to distinguish partial-duplicate images from non-partial-duplicate images. To address this issue, instead of simply identifying geometrically consistent matches, we propose a region-level visual consistency verification scheme to confirm whether there are visually consistent region (VCR) pairs between images for partial-duplicate search. Specifically, after the local feature matching, the potential VCRs are constructed via mapping the regions segmented from candidate images to a query image by utilizing the properties of the matched local features. Then, the compact gradient descriptor and convolutional neural network descriptor are extracted and matched between the potential VCRs to verify their visual consistency to determine whether they are VCRs. Moreover, two fast pruning algorithms are proposed to further improve efficiency. Extensive experiments demonstrate the proposed approach achieves higher accuracy than the state of the art and provide comparable efficiency for large-scale partial-duplicate search tasks.

References

  1. W. Zhou, H. Li, Y. Lu, and Q. Tian. 2013. SIFT match verification by geometric coding for large-scale partial-duplicate web image search. ACM Trans. Multimedia Comput. Commun. 9, 1 (2013), 1--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Wu, Q. Ke, M. Isard, and J. Sun. 2009. Bundling features for large scale partial-duplicate web image search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 25--32.Google ScholarGoogle Scholar
  3. X. Qian, Y. Xue, X. Yang, Y. Y. Tang, X. Hou, and T. Mei. 2015. Landmark summarization with diverse viewpoints. IEEE Trans. Circuits Syst. Video Technol. 25, 11 (2015), 1857--1869.Google ScholarGoogle ScholarCross RefCross Ref
  4. Z. Zhou, Y. Wang, Q. M. J. Wu, C. N. Yang, X. M. Sun. and 2017. Effective and efficient global context verification for image copy detection. IEEE Trans. Inf. Forensics Secur. 12, 1 (2017), 48--63.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Zhou, C. Yang, and B. Chen. 2016. Effective and efficient image copy detection with resistance to arbitrary rotation. IEICE Trans. Inf. Syst. E99.D, 6 (2016), 1531--1540.Google ScholarGoogle ScholarCross RefCross Ref
  6. L. Zheng and C. Song. 2018. Fast near-duplicate image detection in Riemannian space by a novel hashing scheme. CMC-Comput. Mater. Con. 56, 3 (2018), 529--539.Google ScholarGoogle Scholar
  7. Z. Zhou, Y. Cao, M. Wang, E. Fan, and Q. M. J. Wu. 2019. Faster-RCNN based robust coverless information hiding system in cloud environment. IEEE Access 7 (2019), 179891--179897.Google ScholarGoogle ScholarCross RefCross Ref
  8. Z. Zhou, Y. Mu, and Q. M. J. Wu. 2019. Coverless image steganography using partial-duplicate image retrieval. Soft. Comput. 23, 13 (2019), 4927--4938.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Zhao, X. Wu, and C. Ngo. 2010. On the annotation of web videos by efficient near-duplicate search. IEEE Trans. Multimedia 12, 5 (2010), 448--461.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision. 1470--1477.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Yan and R. Sukthankar. 2004. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 506--513.Google ScholarGoogle Scholar
  13. H. Bay, A. Ess, T. Tuytelaars, and L. Gool. 2008. Speeded-up robust features (SURF). Comput. Vis. Image Understand. 110, 3 (2008), 346--59.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Fischer, A. Dosovitskiy, and T. Brox. 2014. Descriptor matching with convolutional neural networks: A comparison to SIFT. arXiv:1405.5769.Google ScholarGoogle Scholar
  15. E. Mohedano, K. Mcguinness, N. E. O'Connor, A. Salvador, F. Marques, and X. Giro-I-Nieto. 2016. Bags of local convolutional features for scalable instance search. In Proceedings of the 2016 ACM International Conference on Multimedia Retrieval. 327--331.Google ScholarGoogle Scholar
  16. L. Chu, S. Jiang, S. Wang, Y. Zhang, and Q. Huang. 2013. Robust spatial consistency graph model for partial duplicate image retrieval. IEEE Trans. Multimedia 15, 8 (2013), 1982--1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shuang Wang and Shuqiang Jiang. 2015. INSTRE: A new benchmark for instance-level object retrieval and recognition. ACM Trans. Multimedia Comput. Commun. 11, 3 (2015), 1--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Jegou, M. Douze, and C. Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the 10th European Conference on Computer Vision: Part I. 304--317.Google ScholarGoogle Scholar
  19. J. Philbin, O. Chum, and O. Isard. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google ScholarGoogle Scholar
  20. Q. Chum, J. Matas, and S. Obdrzalek. 2004. Enhancing RANSAC by generalized model optimization. In Proceedings of the Asian Conference on Computer Vision. 812--817.Google ScholarGoogle Scholar
  21. X. Li, M. Larson, and A. Hanjalic. 2015. Pairwise geometric matching for large-scale object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5153--5161.Google ScholarGoogle Scholar
  22. A. Babenko and V. Lempitsky. 2015. Aggregating deep convolutional features for image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 1269--1277.Google ScholarGoogle Scholar
  23. Y. Kalantidis, C. Mellina, and S. Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In Proceedings of the European Conference on Computer Vision. 685--701.Google ScholarGoogle Scholar
  24. F. Perronnin, Y. Liu, J. Sánchez, and H. Poirier. 2010. Large-scale image retrieval with compressed Fisher vectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3384--3391.Google ScholarGoogle Scholar
  25. H. Jégou, M. Douze, C. Schmid, and P. Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3304--3311.Google ScholarGoogle Scholar
  26. E. Spyromitros-Xioufis, S. Papadopoulos, I. Y. Kompatsiaris, G. Tsoumakas, and I. Vlahavas. 2014. A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE Trans. Multimedia 16, 6 (2014), 1713--1728.Google ScholarGoogle ScholarCross RefCross Ref
  27. L. Zheng, S. Wang, J. Wang, and Q. Tian. 2016. Accurate image search with multi-scale contextual evidences. Int. J. Comput. Vis. 120, 1 (2016), 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems. 1097--1105.Google ScholarGoogle Scholar
  29. Y. Yang, Q. M. J. Wu, X. Feng, and A. Thangarajah. 2019. Recomputation of dense layers for the performance improvement of DCNN. IEEE Trans. Pattern Anal. Mach. Intell. Early access. May 20, 2019. DOI:10.1109/TPAMI.2019.2917685Google ScholarGoogle ScholarCross RefCross Ref
  30. D. Nister and H. Stewenius. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2161--2168.Google ScholarGoogle Scholar
  31. H. Jegou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1 (2011), 117--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Zhang, Q. Huang, and G. Hua. 2010. Building contextual visual vocabulary for large-scale image applications. In Proceedings of the International Conference on Multimedia. 501--510.Google ScholarGoogle Scholar
  33. S. Zhang, Q. Huang, and Y. Lu. 2010. Building pair-wise visual word tree for efficent image re-ranking. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 794--797.Google ScholarGoogle Scholar
  34. S. Zhang, Q. Tian, and Q. Huang. 2014. USB: Ultrashort binary descriptor for fast visual matching and retrieval. IEEE Trans. Image Process. 23, 8 (2014), 3671--3683.Google ScholarGoogle ScholarCross RefCross Ref
  35. W. Zhou, H. Li, and R. Hong. 2015. BSIFT: Toward data-independent codebook for large scale image search. IEEE Trans. Image Process. 24, 3 (2015), 967--979.Google ScholarGoogle ScholarCross RefCross Ref
  36. W. Zhou, M. Yang, and X. Wang. 2016. Scalable feature matching by dual cascaded scalar quantization for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1 (2016), 159--171.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. Zhou, M. Yang, and H. Li. 2014. Towards codebook-free: Scalable cascaded hashing for mobile image search. IEEE Trans. Multimedia 16, 3 (2014), 601--611.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Z. Liu, H. Li, W. Zhou, and Q. Tian. 2012. Embedding spatial context information into inverted file for large-scale image retrieval. In Proceedings of the ACM International Conference on Multimedia. 199--208.Google ScholarGoogle Scholar
  39. J. Yao, B. Yang, and Q. Zhu. 2015. Near-duplicate image retrieval based on contextual descriptor. IEEE Signal Process Lett. 22, 9 (2015), 1404--1408.Google ScholarGoogle ScholarCross RefCross Ref
  40. Z. Zhou, Q. Wu, and X. Sun. 2018. Encoding multiple contextual clues for partial-duplicate image retrieval. Pattern Recognit. Lett. 109 (2018), 18--26.Google ScholarGoogle ScholarCross RefCross Ref
  41. K. Mikolajczyk and C. Schmid. 2004. Scale affine invariant interest point detectors. Int. J. Comput. Vis. 60, 1 (2004), 63--86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Matas, O. Chum, M. Urban, and T. Pajdla. 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image Vision Comput. 22, 10 (2004), 761--767.Google ScholarGoogle ScholarCross RefCross Ref
  43. M. Fischle and R. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (1981), 381--95.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. Felzenszwalb and H. Huttenlocher. 2004. Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 2 (2004), 167--181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. C. Liu, J. Yuen, and A. Torralba. 2011. Nonparametric scene parsing via label transfer. IEEE Trans. Pattern Anal. Mach. Intell. 33, 12 (2011), 2368--2382.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 11 (2012), 2274--2281.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Tighe and S. Lazebnik. 2013. Superparsing: Scalable nonparametric image parsing with superpixels. Int. J. Comput. Vis. 101, 2 (2013), 329--349.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 886--893.Google ScholarGoogle Scholar
  49. R. Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1440--1448.Google ScholarGoogle Scholar
  50. University of San Antonio. 2011. DupImage. Available at http://www.cs.utsa.edu/∼wzhou/data/DupGroundTruthDataset.tgz.Google ScholarGoogle Scholar
  51. Oxford. 2007. Oxford5K. Available at http://www.robots.ox.ac.uk/∼vgg/data/oxbuildings.Google ScholarGoogle Scholar
  52. University of Kentucky. 2006. UKBench. Available at http://www.vis.uky.edu/∼stewe/ukbench.Google ScholarGoogle Scholar
  53. 2008. FLIKR Images. Available at http://press.liacs.nl/mirflickr/mirdownload.html.Google ScholarGoogle Scholar

Index Terms

  1. Region-Level Visual Consistency Verification for Large-Scale Partial-Duplicate Image Search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!