Abstract
This article studies the correspondence problem for semantically similar images, which is challenging due to the joint visual and geometric deformations. We introduce the Flip-aware Distance Ratio method (FDR) to solve this problem from the perspective of geometric structure analysis. First, a distance ratio constraint is introduced to enforce the geometric consistencies between images with large visual variations, whereas local geometric jitters are tolerated via a smoothness term. For challenging cases with symmetric structures, our proposed method exploits Curl to suppress the mismatches. Subsequently, image correspondence is formulated as a permutation problem, for which we propose a Gradient Guided Simulated Annealing (GGSA) algorithm to perform a robust discrete optimization. Experiments on simulated and real-world datasets, where both visual and geometric deformations are present, indicate that our method significantly improves the baselines for both visually and semantically similar images.
- Manya V. Afonso, Jacinto C. Nascimento, and Jorge S. Marques. 2013. Automatic estimation of multiple motion fields from video sequences using a region matching based approach. IEEE Transactions on Multimedia 16, 1 (2013), 1–14.Google Scholar
Cross Ref
- Xaro Benavent, Ana Garcia-Serrano, Ruben Granados, Joan Benavent, and Esther de Ves. 2013. Multimedia information retrieval based on late semantic fusion approaches: Experiments on a Wikipedia image collection. IEEE Transactions on Multimedia 15, 8 (2013), 2009–2021. Google Scholar
Digital Library
- Alexander C. Berg, Tamara L. Berg, and Jitendra Malik. 2005. Shape matching and object recognition using low distortion correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 26–33. Google Scholar
Digital Library
- Goutam Bhat, Felix Järemo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc Van Gool, and Radu Timofte. 2020. Learning what to learn for video object segmentation. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- Hilton Bristow, Jack Valmadre, and Simon Lucey. 2015. Dense semantic correspondence where every pixel is a classifier. In Proceedings of the IEEE International Conference on Computer Vision. 4024–4031. Google Scholar
Digital Library
- Michael Calonder, Vincent Lepetit, Mustafa Özuysal, Tomasz Trzcinski, Christoph Strecha, and Pascal Fua. 2012. BRIEF: Computing a local binary descriptor very fast. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1281–1298. Google Scholar
Digital Library
- Minsu Cho, Karteek Alahari, and Jean Ponce. 2013. Learning graphs to match. In Proceedings of the IEEE International Conference on Computer Vision. 25–32. Google Scholar
Digital Library
- Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted random walks for graph matching. In Proceedings of the European Conference on Computer Vision. 492–505. Google Scholar
Digital Library
- Minsu Cho, Jian Sun, Olivier Duchenne, and Jean Ponce. 2014. Finding matches in a haystack: A max-pooling strategy for graph matching in the presence of outliers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2091–2098. Google Scholar
Digital Library
- Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 886–893. Google Scholar
Digital Library
- Olivier Duchenne, Armand Joulin, and Jean Ponce. 2011. A graph-matching kernel for object categorization. In Proceedings of the IEEE International Conference on Computer Vision. 1792–1799. Google Scholar
Digital Library
- Fangxiang Feng, Xiaojie Wang, Ruifan Li, and Ibrar Ahmad. 2015. Correspondence autoencoders for cross-modal retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 1s (2015), 22. Google Scholar
Digital Library
- Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6 (1981), 381–395. Google Scholar
Digital Library
- Shiming Ge, Jia Li, Qiting Ye, and Zhao Luo. 2017. Detecting masked faces in the wild with LLE-CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 426–434.Google Scholar
Cross Ref
- Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448. Google Scholar
Digital Library
- V. Granville, M. Krivanek, and J. P. Rasson. 1994. Simulated annealing: A proof of convergence. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 6 (1994), 652–656. Google Scholar
Digital Library
- Tal Hassner, Viki Mayzels, and Lihi Zelnik-Manor. 2012. On SIFTs and their scales. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1522–1528. Google Scholar
Digital Library
- Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, and Xuming He. 2019. Dynamic context correspondence network for semantic alignment. In Proceedings of the IEEE International Conference on Computer Vision. 2010–2019.Google Scholar
Cross Ref
- Junhwa Hur, Hwasup Lim, Changsoo Park, and Sang Chul Ahn. 2015. Generalized deformable spatial pyramid: Geometry-preserving dense correspondence estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1392–1400.Google Scholar
Cross Ref
- Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2018. End-to-end weakly-supervised semantic alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6917–6925.Google Scholar
Cross Ref
- Sangryul Jeon, Dongbo Min, Seungryong Kim, and Kwanghoon Sohn. 2019. Joint learning of semantic alignment and object landmark detection. In Proceedings of the IEEE International Conference on Computer Vision. 7293–7302.Google Scholar
Cross Ref
- Jaechul Kim, Ce Liu, Fei Sha, and Kristen Grauman. 2013. Deformable spatial pyramid matching for fast dense correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2307–2314. Google Scholar
Digital Library
- Suna Kim, Suha Kwak, Jan Feyereisl, and Bohyung Han. 2012. Online multi-target tracking by large margin structured learning. In Proceedings of the Asian Conference on Computer Vision. 98–111. Google Scholar
Digital Library
- Seungryong Kim, Stephen Lin, Sang Ryul Jeon, Dongbo Min, and Kwanghoon Sohn. 2018. Recurrent transformer networks for semantic correspondence. In Proceedings of the International Conference on Neural Information Processing Systems. 6126–6136. Google Scholar
Digital Library
- Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, and Kwanghoon Sohn. 2017. FCSS: Fully convolutional self-similarity for dense semantic correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 616–625.Google Scholar
Cross Ref
- Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2017. DCTM: Discrete-continuous transformation matching for semantic flow. In Proceedings of the IEEE International Conference on Computer Vision. 4539–4548.Google Scholar
Cross Ref
- Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2020. Discrete-continuous transformation matching for dense semantic correspondence. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 1 (2020), 59–73.Google Scholar
Digital Library
- S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671–680. Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90. Google Scholar
Digital Library
- Eugene L. Lawler. 1963. The quadratic assignment problem. Management Science 9, 4 (1963), 586–599. Google Scholar
Digital Library
- Junghyup Lee, Dohyung Kim, Wonkyung Lee, Jean Ponce, and Bumsub Ham. 2020. Learning semantic correspondence exploiting an object-level prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. Early access, August 3, 2020.Google Scholar
Cross Ref
- Soon-Young Lee, Jae-Young Sim, Chang-Su Kim, and Sang-Uk Lee. 2013. Correspondence matching of multi-view video sequences using mutual information based similarity measure. IEEE Transactions on Multimedia 15, 8 (2013), 1719–1731. Google Scholar
Digital Library
- Marius Leordeanu and Martial Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In Proceedings of the IEEE International Conference on Computer Vision. 1482–1489. Google Scholar
Digital Library
- Marius Leordeanu, Martial Hebert, and Rahul Sukthankar. 2009. An integer projected fixed point method for graph matching and MAP inference. In Proceedings of the International Conference on Neural Information Processing Systems. 1114–1122. Google Scholar
Digital Library
- Chueh-Yu Li and Chiou-Ting Hsu. 2008. Image retrieval with relevance feedback based on graph-theoretic region correspondence estimation. IEEE Transactions on Multimedia 10, 3 (2008), 447–456. Google Scholar
Digital Library
- Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 12 (2011), 2368–2382. Google Scholar
Digital Library
- Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. SIFT flow: Dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 5 (2011), 978–994. Google Scholar
Digital Library
- David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision. 1150–1157. Google Scholar
Digital Library
- K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool. 2019. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 6 (2019), 1515–1530.Google Scholar
Digital Library
- David Novotný, Diane Larlus, and Andrea Vedaldi. 2017. AnchorNet: A weakly supervised network to learn geometry-sensitive features for semantic matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2867–2876.Google Scholar
Cross Ref
- Deepti Pachauri, Risi Kondor, and Vikas Singh. 2013. Solving the multi-way matching problem by permutation synchronization. In Proceedings of the International Conference on Neural Information Processing Systems. 1860–1868. Google Scholar
Digital Library
- Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, and Dahua Lin. 2019. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 821–830.Google Scholar
Cross Ref
- Federico Perazzi, Anna Khoreva, Rodrigo Benenson, Bernt Schiele, and Alexander Sorkine-Hornung. 2017. Learning video object segmentation from static images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3491–3500.Google Scholar
Cross Ref
- Richard Roberts, Sudipta N. Sinha, Richard Szeliski, and Drew Steedly. 2011. Structure from motion for scenes with large duplicate structures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3137–3144. Google Scholar
Digital Library
- Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2017. Convolutional neural network architecture for geometric matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 39–48.Google Scholar
Cross Ref
- Douglas C. Schmidt and Larry E. Druffel. 1976. A fast backtracking algorithm to test directed graphs for isomorphism using distance matrices. Journal of the ACM 23, 3 (1976), 433–445. Google Scholar
Digital Library
- Yumin Suh, Kamil Adamczewski, and Kyoung Mu Lee. 2015. Subgraph matching using compactness prior for robust feature correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5070–5078.Google Scholar
Cross Ref
- Yumin Suh, Minsu Cho, and Kyoung Mu Lee. 2012. Graph matching via sequential Monte Carlo. In Proceedings of the European Conference on Computer Vision. 624–637. Google Scholar
Digital Library
- Yoshikazu Terada and Ulrike V. Luxburg. 2014. Local ordinal embedding. In Proceedings of the International Conference on Machine Learning. 847–855. Google Scholar
Digital Library
- Prune Truong, Martin Danelljan, and Radu Timofte. 2020. GLU-Net: Global-local universal network for dense flow and correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6258–6268.Google Scholar
Cross Ref
- Nikolai Ufer and Bjorn Ommer. 2017. Deep semantic feature matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5929–5938.Google Scholar
Cross Ref
- Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31–42. Google Scholar
Digital Library
- Carl Martin Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. 2018. Tracking emerges by colorizing videos. In Proceedings of the European Conference on Computer Vision. 402–419.Google Scholar
Cross Ref
- Rui Wang, Dong Liang, Wei Zhang, and Xiaochun Cao. 2016. MatchDR: Image correspondence by leveraging distance ratio constraint. In Proceedings of the ACM Conference on Multimedia. 606–610. Google Scholar
Digital Library
- Zhiyu Wang, Peng Cui, Lexing Xie, Wenwu Zhu, Yong Rui, and Shiqiang Yang. 2014. Bilateral correspondence model for words-and-pictures association in multimedia-rich microblogs. ACM Transactions on Multimedia Computing, Communications, and Applications 10, 4 (2014), 21. Google Scholar
Digital Library
- Zhichao Yin, Trevor Darrell, and Fisher Yu. 2019. Hierarchical discrete distribution decomposition for match density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6044–6053.Google Scholar
Cross Ref
- Zhengyou Zhang. 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13, 2 (1994), 119–152. Google Scholar
Digital Library
- Wanlei Zhao and Chong-Wah Ngo. 2013. Flip-Invariant SIFT for copy and object detection. IEEE Transactions on Image Processing 22, 3 (2013), 980–991. Google Scholar
Digital Library
- Xiaowei Zhou, Menglong Zhu, and Kostas Daniilidis. 2015. Multi-image matching via fast alternating minimization. In Proceedings of the IEEE International Conference on Computer Vision. 4032–4040. Google Scholar
Digital Library
Index Terms
Semantic Correspondence with Geometric Structure Analysis
Recommendations
A geometric criterion for shape-based non-rigid correspondence
ICCV '95: Proceedings of the Fifth International Conference on Computer VisionA geometric criterion is developed for establishing shape based non rigid correspondence between plane curves. Unlike previous efforts, the criterion does not use rigid invariants of shape. Instead, shapes are compared non rigidly from the vantage point ...
3-D surface reconstruction from stereoscopic image sequences
ICCV '95: Proceedings of the Fifth International Conference on Computer VisionA stereoscopic scene analysis system for 3-D modeling of objects from stereoscopic image sequences is described. A dense map of 3-D surface points is obtained by image correspondence, object segmentation, interpolation, and triangulation. Emphasis is ...
MatchDR: Image Correspondence by Leveraging Distance Ratio Constraint
MM '16: Proceedings of the 24th ACM international conference on MultimediaImage correspondence is to establish the connections between coherent images, which can be quite challenging due to the visual and geometric deformations. This paper proposes a robust image correspondence technique from the perspective of spatial ...






Comments