10.1145/1991996.1992047acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedings
research-article

Automatic tagging and geotagging in video collections and communities

Published:18 April 2011

ABSTRACT

Automatically generated tags and geotags hold great promise to improve access to video collections and online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features.

References

  1. J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: Geotagging web content. In SIGIR '04, pages 273--280, 2004. Google ScholarGoogle Scholar
  3. K. Chandramouli, T. Kliegr, T. Piatrik, and E. Izquierdo. QMUL @ MediaEval 2010 Tagging Task: Semantic query expansion for predicting user tags. In MediaEval '10 Working Notes, 2010.Google ScholarGoogle Scholar
  4. Z. Cheng, J. Caverlee, and K. Lee. You are where you tweet: A content-based approach to geo-locating Twitter users. In CIKM '10, pages 759--768, 2010. Google ScholarGoogle Scholar
  5. J. Choi, A. Janin, and G. Friedland. The 2010 ICSI video location estimation system. In MediaEval '10 Working Notes, 2010.Google ScholarGoogle Scholar
  6. T. Cover and J. Thomas. Elements of Information Theory. Wiley-Interscience, 2006. Google ScholarGoogle Scholar
  7. D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In WWW '09, pages 761--770, 2009. Google ScholarGoogle Scholar
  8. C. Danesi and C. Clavel. Impact of spontaneous speech features on business concept detection: A study of call-centre data. In SSCS '10, pages 11--14, 2010. Google ScholarGoogle Scholar
  9. S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM '98, pages 148--155, 1998. Google ScholarGoogle Scholar
  10. F. Eisterlehner, A. Hotho, and R. Jäschke, editors. Proceedings of the ECML PKDD Discovery Challenge 2009, Sept. 2009.Google ScholarGoogle Scholar
  11. E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning. Domain-specific keyphrase extraction. In IJCAI '99, pages 668--673, 1999. Google ScholarGoogle Scholar
  12. J. Garofolo, G. Auzanne, and E. Voorhees. The TREC Spoken Document Retrieval Track: A success story. In RIAO '00, pages 1--20, 2000.Google ScholarGoogle Scholar
  13. A. Gyarmati and G. J. F. Jones. DCU at MediaEval 2010---Tagging Task Wild Wild Web. In MediaEval '10 Working Notes, 2010.Google ScholarGoogle Scholar
  14. J. Hays and A. Efros. IM2GPS: Estimating geographic information from a single image. In CVPR '08, pages 1--8, 2008.Google ScholarGoogle Scholar
  15. D. Hernández-Aranda, R. Granados, J. Cigarran, A. Rodrigo, V. Fresno, and A. García-Serrano. UNED at MediaEval 2010: Exploiting text metadata for automatic video tagging. In MediaEval '10 Working Notes, 2010.Google ScholarGoogle Scholar
  16. P. Heymann, D. Ramage, and H. Garcia-Molina. Social tag prediction. In SIGIR '08, pages 531--538, 2008. Google ScholarGoogle Scholar
  17. A. Hotho, B. Krause, D. Benz, and R. Jäschke, editors. Proceedings of the ECML PKDD Discovery Challenge 2008, Sept. 2008.Google ScholarGoogle Scholar
  18. M. Huijbregts, R. Ordelman, and F. de Jong. Annotation of heterogeneous multimedia content using automatic speech recognition. In Semantic Multimedia, volume 4816 of LNCS, pages 78--90. Springer, 2007. Google ScholarGoogle Scholar
  19. P. Kelm, S. Schmiedeke, and T. Sikora. Feature-based video key frame extraction for low quality video sequences. In WIAMIS '09, pages 25--28, 2009.Google ScholarGoogle Scholar
  20. P. Kelm, S. Schmiedeke, and T. Sikora. Multi-modal, multi-resource methods for placing Flickr videos on the map. In ICMR '11 (this proceedings), 2011. Google ScholarGoogle Scholar
  21. J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR '01, pages 111--119, 2001. Google ScholarGoogle Scholar
  22. L. Lamel and J.-L. Gauvain. Speech processing for audio indexing. In Advances in Natural Language Processing, volume 5221 of LNCS, pages 4--15. Springer, 2008. Google ScholarGoogle Scholar
  23. V. Malaisé, A. Isaac, L. Gazendam, and H. Brugman. Anchoring Dutch Cultural Heritage Thesauri to WordNet: Two Case Studies. In LaTeCH '07, pages 57--64, 2007.Google ScholarGoogle Scholar
  24. M. Montagnuolo and A. Messina. Parallel neural networks for multimodal video genre classification. Multimedia Tools Appl., 41(1):125--159, 2009. Google ScholarGoogle Scholar
  25. E. Newman and G. J. F. Jones. DCU at VideoClef 2008. In Evaluating Systems for Multilingual and Multimodal Information Access, volume 5706 of LNCS, pages 923--926. Springer, 2009. Google ScholarGoogle Scholar
  26. J. S. Olsson and D. W. Oard. Improving text classification for oral history archives with temporal domain knowledge. In SIGIR '07, pages 623--630, 2007. Google ScholarGoogle Scholar
  27. G. Paaß, E. Leopold, M. Larson, J. Kindermann, and S. Eickeler. SVM classification using sequences of phonemes and syllables. In Principles of Data Mining and Knowledge Discovery, volume 2431 of LNCS, pages 373--384. Springer, 2002. Google ScholarGoogle Scholar
  28. P. Pecina, P. Hoffmannová, G. J. F. Jones, Y. Zhang, and D. W. Oard. Overview of the CLEF 2007 Cross-Language Speech Retrieval Track. In Advances in Multilingual and Multimodal Information Retrieval, volume 5152 of LNCS, pages 674--686. Springer, 2008. Google ScholarGoogle Scholar
  29. T. Rattenbury and M. Naaman. Methods for extracting place semantics from Flickr tags. ACM Transactions on the Web, 3(1):1, 2009. Google ScholarGoogle Scholar
  30. S. Rendle and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In WSDM '10, pages 81--90, 2010. Google ScholarGoogle Scholar
  31. R. C. Rose, E. I. Chang, and R. P. Lippmann. Techniques for information retrieval from voice messages. In ICASSP '91, pages 317--320, 1991. Google ScholarGoogle Scholar
  32. P. Serdyukov, V. Murdock, and R. van Zwol. Placing Flickr photos on a map. In SIGIR '09, pages 484--491, 2009. Google ScholarGoogle Scholar
  33. B. Sigurbjörnsson and R. Van Zwol. Flickr tag recommendation based on collective knowledge. In WWW '08, pages 327--336, 2008. Google ScholarGoogle Scholar
  34. A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and TRECVid. In MIR '06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pages 321--330, 2006. Google ScholarGoogle Scholar
  35. A. Stolcke et al. Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics, 26(3):339--373, 2000. Google ScholarGoogle Scholar
  36. O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of Flickr resources using language models and similarity search. In ICMR '11 (this proceedings), 2011. Google ScholarGoogle Scholar
  37. C. Wartena. Using a divergence model for MediaEval's Tagging Task (Professional Version). In MediaEval '10 Working Notes, 2010.Google ScholarGoogle Scholar
  38. C. Wartena and R. Brussee. Topic detection by clustering keywords. In DEXA '08, pages 54--58, 2008. Google ScholarGoogle Scholar
  39. C. Wartena, R. Brussee, and W. Slakhorst. Keyword extraction using word co-occurrence. In DEXA '10, pages 54--58, 2010. Google ScholarGoogle Scholar
  40. C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01, pages 403--410, 2001. Google ScholarGoogle Scholar

Index Terms

  1. Automatic tagging and geotagging in video collections and communities

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        ACM Conferences cover image
        ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval
        April 2011
        512 pages
        ISBN:9781450303361
        DOI:10.1145/1991996

        Copyright © 2011 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 April 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate 254 of 830 submissions, 31%

        Upcoming Conference

        MM '22

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!