10.1145/2324796.2324857acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedings
research-article

A visual approach for video geocoding using bag-of-scenes

Authors Info & Claims
Published:05 June 2012

ABSTRACT

This paper presents a novel approach for video representation, called bag-of-scenes. The proposed method is based on dictionaries of scenes, which provide a high-level representation for videos. Scenes are elements with much more semantic information than local features, specially for geotagging videos using visual content. Thus, each component of the representation model has self-contained semantics and, hence, it can be directly related to a specific place of interest. Experiments were conducted in the context of the MediaEval 2011 Placing Task. The reported results show our strategy compared to those from other participants that used only visual content to accomplish this task. Despite our very simple way to generate the visual dictionary, which has taken photos at random, the results show that our approach presents high accuracy relative to the state-of-the art solutions.

References

  1. J. Almeida, N. J. Leite, and R. Torres. Comparison of video sequences with histograms of motion patterns. In ICIP, pages 3673--3676, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  2. J. Almeida, N. J. Leite, and R. Torres. VISON: VIdeo Summarization for ONline applications. Pattern Recognition Letters, 33(4):397--409, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Almeida, N. J. Leite, and R. Torres. Online video summarization on compressed domain. J. Visual Communication and Image Representation, 2012. DOI: 10.1016/j.jvcir.2012.01.009.Google ScholarGoogle Scholar
  4. J. Almeida, R. Torres, and N. J. Leite. Rapid video summarization on compressed video. In ISM, pages 113--120, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Avila, N. Thome, M. Cord, E. Valle, and A. de A. Araújo. Bossa: Extended bow formalism for image classification. In ICIP, pages 2966--2969, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  6. Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. CVPR, pages 2559--2566, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Choi, H. Lei, and G. Friedland. The 2011 ICSI video location estimation system. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google ScholarGoogle Scholar
  8. C. Hauff and G.-J. Houben. WISTUD at MediaEval 2011: Placing task. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google ScholarGoogle Scholar
  9. J. Hays and A. A. Efros. im2gps: estimating geographic information from a single image. In CVPR, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  10. Y.-G. Jiang and C.-W. Ngo. Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval. Computer Vision and Image Understanding, 113(3):405--414, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Kalantidis, G. Tolias, Y. Avrithis, M. Phinikettos, E. Spyrou, P. Mylonas, and S. Kollias. Viral: Visual image retrieval and localization. Multimedia Tools and Applications, 51:555--592, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Ke, R. Sukthankar, and L. Huston. An efficient parts-based near-duplicate and sub-image retrieval system. In ACM MM, pages 869--876, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Kelm, S. Schmiedeke, and T. Sikora. Multi-modal, Multi-resource Methods for Placing Flickr Videos on the Map. In ACM ICMR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. P. X. L-J. Li, H. Su and L. Fei-Fei. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In NIPS, 2010.Google ScholarGoogle Scholar
  15. O. V. Laere, S. Schockaert, and B. Dhoedt. Ghent university at the 2011 placing task. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google ScholarGoogle Scholar
  16. I. Laptev. On space-time interest points. Int. J. Comp. Vision, 64(2--3):107--123, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones. Automatic tagging and geotagging in video collections and communities. In ACM ICMR, pages 51:1--51:8, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. R. Larson. Geographic information retrieval and digital libraries. In ECDL, volume 5714/2009, pages 461--464, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, volume 2, pages 2169--2178, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. T. Li, J. Almeida, and R. Torres. RECOD working notes for placing task MediaEval 2011. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google ScholarGoogle Scholar
  21. L. Liu, L. Wang, and X. Liu. In defense of soft-assignment coding. In ICCV, pages 1--8, 2011.Google ScholarGoogle Scholar
  22. D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vision, 60(2):91--110, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Luo, D. Joshi, J. Yu, and A. Gallagher. Geotagging in multimedia and computer vision-a survey. Multimedia Tools Appl., 51:187--211, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. S. Manjunath, J.-R. Ohm, V. V. Vasudevan, and A. Yamada. Color and texture descriptors. IEEE Trans. Circuits Syst. Video Techn., 11(6):703--715, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Mikolajczyk and C. Schmid. Scale & affine invariant interest point detectors. Int. J. Comp. Vision, 60(1):63--86, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. TPAMI, 27(10):1615--1630, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. J. Van Gool. A comparison of affine region detectors. Int. J. Comp. Vision, 65(1--2):43--72, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C.-W. Ngo, W. Zhao, and Y.-G. Jiang. Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation. In ACM MM, pages 845--854, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. J. Pan and Q. Yang. A survey on transfer learning. IEEE TKDE, 22(10):1345--1359, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. O. A. B. Penatti, E. Valle, and R. Torres. Encoding spatial arrangement of visual words. In CIARP, volume 7042, pages 240--247, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, pages 1--8, Jun. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  32. M. J. Pickering, D. Heesch, S. M. Rüger, R. O'Callaghan, and D. R. Bull. Video retrieval using global features in keyframes. In TREC, 2002.Google ScholarGoogle Scholar
  33. A. Rae, V. Murdock, P. Serdyukov, and P. Kelm. Working notes for the placing task at MediaEval 2011. In Working Notes Proc. MediaEval Workshop, volume 807, 2011.Google ScholarGoogle Scholar
  34. M. Rautiainen and D. S. Doermann. Temporal color correlograms for video retrieval. In ICPR, pages 267--270, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Serdyukov, V. Murdock, and R. van Zwol. Placing flickr photos on a map. In ACM SIGIR, pages 484--491, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. In ICCV, pages 1470--1477 vol.2, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. R. Smith, S. Srinivasan, A. Amir, S. Basu, G. Iyengar, C.-Y. Lin, M. R. Naphade, D. B. Ponceleon, and B. L. Tseng. Integrating features, models, and semantics for trec video retrieval. In TREC, 2001.Google ScholarGoogle Scholar
  38. T. Tuytelaars and K. Mikolajczyk. Local invariant feature detectors: a survey. Foundations and Trends in Computer Graphics and Vision, 3:177--280, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and scene recognition. TPAMI, 32(9):1582--1596, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, and J.-M. Geusebroek. Visual word ambiguity. TPAMI, 32:1271--1283, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of flickr resources using language models and similarity search. In ACM ICMR, pages 48:1--48:8, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. V. Viitaniemi and J. Laaksonen. Experiments on selection of codebooks for local image feature histograms. In Int. Conf. on Visual Inf. Systems: Web-Based Visual Inf. Search and Management, pages 126--137, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. Wu, Y. Guo, X. Qiu, Z. Feng, J. Rong, W. Jin, D. Zhou, R. Wang, and M. Jin. Fudan university at TRECVID 2003. In TRECVid, 2003.Google ScholarGoogle Scholar
  44. X. Wu, W. Zhao, and C.-W. Ngo. Near-duplicate keyframe retrieval with visual keywords and semantic context. In CIVR, pages 162--169, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A visual approach for video geocoding using bag-of-scenes

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            ACM Conferences cover image
            ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
            June 2012
            489 pages
            ISBN:9781450313292
            DOI:10.1145/2324796

            Copyright © 2012 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 June 2012

            Permissions

            Request permissions about this article.

            Request Permissions

            Qualifiers

            • research-article

            Acceptance Rates

            ICMR '12 Paper Acceptance Rate 50 of 145 submissions, 34%
            Overall Acceptance Rate 254 of 830 submissions, 31%

            Upcoming Conference

            MM '22

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!