skip to main content
research-article

Near-duplicate keyframe retrieval by semi-supervised learning and nonrigid image matching

Published:04 February 2011Publication History
Skip Abstract Section

Abstract

Near-duplicate keyframe (NDK) retrieval techniques are critical to many real-world multimedia applications. Over the last few years, we have witnessed a surge of attention on studying near-duplicate image/keyframe retrieval in the multimedia community. To facilitate an effective approach to NDK retrieval on large-scale data, we suggest an effective Multi-Level Ranking (MLR) scheme that effectively retrieves NDKs in a coarse-to-fine manner. One key stage of the MLR ranking scheme is how to learn an effective ranking function with extremely small training examples in a near-duplicate detection task. To attack this challenge, we employ a semi-supervised learning method, semi-supervised support vector machines, which is able to significantly improve the retrieval performance by exploiting unlabeled data. Another key stage of the MLR scheme is to perform a fine matching among a subset of keyframe candidates retrieved from the previous coarse ranking stage. In contrast to previous approaches based on either simple heuristics or rigid matching models, we propose a novel Nonrigid Image Matching (NIM) approach to tackle near-duplicate keyframe retrieval from real-world video corpora in order to conduct an effective fine matching. Compared with the conventional methods, the proposed NIM approach can recover explicit mapping between two near-duplicate images with a few deformation parameters and find out the correct correspondences from noisy data simultaneously. To evaluate the effectiveness of our proposed approach, we performed extensive experiments on two benchmark testbeds extracted from the TRECVID2003 and TRECVID2004 corpora. The promising results indicate that our proposed method is more effective than other state-of-the-art approaches for near-duplicate keyframe retrieval.

References

  1. Andoni, A. and Indyk, P. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Comm. ACM 51, 1, 117--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bay, H., Tuytelaars, T., and Gool, L. J. V. 2006. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision. 404--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Canny, J. 1986. A computational approach to edge detection. IEEE Trans. Patt. Anal. Mach. Intell. 8, 6, 679--698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chum, O. and Matas, J. 2005. Matching with prosac- progressive sample consensus. In Proceedings of the Conference on Computer Vision and Pattern Recognition. Vol. 1. 220--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chum, O., Philbin, J., Isard, M., and Zisserman, A. 2007. Scalable near identical image and shot detection. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR'07). 549--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. 2007. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.citeulike.org/user/Comm.doubleshow/tag/file-import-09-04-17.Google ScholarGoogle Scholar
  8. Fischler, M. A. and Bolles, R. C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. CACM 24, 6, 381--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fua, P. and Leclerc, Y. 1995. Object-centered surface reconstruction: Combining multi-image stereo and shading. Int. J. Comput. Visi. 16, 1, 35--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fukunaga, K. 1990. Introduction to Statistical Pattern Recognition. Academic Press Professional, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hoi, C.-H., Wang, W., and Lyu, M. R. 2003. A novel scheme for video similarity detection. In Proceedings of the International Conference on Image and Video Retrieval. 373--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hoi, S. C. and Lyu, M. R. 2008. A multi-modal and multi-level ranking framework for content-based video retrieval. IEEE Trans. Multimed. 10, 4, 607--619. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kass, M., Witkin, A., and Terzopoulos, D. 1988. Snakes: Active contour models. Int. J. Comput. Visi. 1, 4, 321--331.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ke, Y., Sukthankar, R., and Huston, L. 2004. Efficient near-duplicate detction and sub-image retrieval system. In Proceedings of ACM MULTIMEDIA. ACM, 869--876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lades, M., Vorbruggen, J. C., Buhmann, J., Lange, J., von der Malsburg, C., Wurtz, R. P., and Konen, W. 1993. Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput. 42, 5, 300--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Visi. 60, 2, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mikolajczyk, K. and Schmid, C. 2005. A performance evaluation of local descriptors. IEEE Trans. Patt. Analys. Mach. Intel. 27, 10, 1615--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ngo, C.-W., Zhao, W.-L., and Jiang, Y.-G. 2006. Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation. In Proceedings of ACM MULTIMEDIA. ACM, 845--854. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ojala, T., Pietikainen, M., and Harwood, D. 1996. A comparative study of texture measures with classification based on feature distributions. Patt. Recog. 29, 1, 51--59.Google ScholarGoogle ScholarCross RefCross Ref
  20. Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Visi. 42, 3, 145--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pilet, J., Lepetit, V., and Fua, P. 2008. Fast non-rigid surface detection, registration, and realistic augmentation. Int. J. Comput. Visi. 76, 2, 109--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Qamra, A., Meng, Y., and Chang, E. Y. 2005. Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Patt. Anal. Mach. Intell. 27, 3, 379--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rubner, Y., Tomasi, C., and Guibas, L. J. 2000. The earth mover's distance as a metric for image retrieval. Int. J. Comput. Visi. 40, 2, 99--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sindhwani, V., Niyogi, P., and Belkin, M. 2005. Beyond the point cloud: from transductive to semi-supervised learning. In Proceedings of the International Conference on Machine Learning. ACM Press, 824--831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sivic, J. and Zisserman, A. 2003. Video google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision (ICCV'3). 1470--1477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Smeaton, A. F., Over, P., and Kraaij, W. 2006. Evaluation campaigns and trecvid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR'06). ACM Press, New York, NY, 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Vapnik, V. N. 1998. Statistical Learning Theory. John Wiley & Sons.Google ScholarGoogle Scholar
  28. Wu, X., Hauptmann, A. G., and Ngo, C.-W. 2007a. Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts. In Proceedings of ACM MULTIMEDIA. ACM, 168--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wu, X., Hauptmann, A. G., and Ngo, C.-W. 2007b. Practical elimination of near-duplicates from web video search. In Proceedings of ACM MULTIMEDIA. ACM, 218--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wu, X., Zhao, W.-L., and Ngo, C.-W. 2007c. Near-duplicate keyframe retrieval with visual keywords and semantic context. In Proceedings of the International Conference on Image and Video Retrieval. ACM, 162--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xu, D., Cham, T.-J., Yan, S., and Chang, S.-F. 2008. Near duplicate image identification with spatially aligned pyramid matching. In Proceedings of the Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  32. Xu, Z., Jin, R., Zhu, J., King, I., and Lyu, M. R. 2007. Efficient convex relaxation for transductive support vector machine. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1641--1648.Google ScholarGoogle Scholar
  33. Yan, R., Hauptmann, A. G., and Jin, R. 2003. Negative pseudo-relevance feedback in content-based video retrieval. In Proceedings of ACM MULTIMEDIA. 343--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhang, D.-Q. and Chang, S.-F. 2004. Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In Proceedings of ACM MULTIMEDIA. ACM, 877--884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhao, W., Chellappa, R., Phillips, P. J., and Rosenfeld, A. 2003. Face recognition: A literature survey. ACM Comput. Surv. 35, 4, 399--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhao, W., Jiang, Y., and Ngo, C. 2006. Keyframe retrieval by keypoints: Can point-to-point matching help? In Proceedings of the International Conference on Image and Video Retrieval. 72--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhao, W.-L., Ngo, C.-W., Tan, H. K., and Wu, X. 2007. Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Trans. Multimed. 9, 5, 1037--1048. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhu, J., Hoi, S. C., and Lyu, M. R. 2008a. Face annotation by transductive kernel fisher discriminant. IEEE Trans. Multimed. 10, 1, 86--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhu, J., Hoi, S. C., Lyu, M. R., and Yan, S. 2008b. Near-duplicate keyframe retrieval by nonrigid image matching. In Proceedings of ACM MULTIMEDIA. 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhu, J., Hoi, S. C., Xu, Z., and Lyu, M. R. 2008c. An effective approach to 3d deformable surface tracking. In Proceedings of the European Conference on Computer Vision. III: 766--779. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhu, J. and Lyu, M. R. 2007. Progressive finite newton approach to real-time nonrigid surface detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  42. Zhu, J., Lyu, M. R., and Huang, T. S. 2009. A fast 2d shape recovery approach by fusing features and appearance. IEEE Trans. Patt. Anal. Mach. Intell. 31, 7, 1210--1224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhu, X. 2005. Semi-supervised learning literature survey. Tech. rep., Carnegie Mellon University.Google ScholarGoogle Scholar

Index Terms

  1. Near-duplicate keyframe retrieval by semi-supervised learning and nonrigid image matching

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!