ABSTRACT
Automatically generated tags and geotags hold great promise to improve access to video collections and online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features.
References
- J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer, 2002. Google Scholar
Digital Library
- E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: Geotagging web content. In SIGIR '04, pages 273--280, 2004. Google Scholar
- K. Chandramouli, T. Kliegr, T. Piatrik, and E. Izquierdo. QMUL @ MediaEval 2010 Tagging Task: Semantic query expansion for predicting user tags. In MediaEval '10 Working Notes, 2010.Google Scholar
- Z. Cheng, J. Caverlee, and K. Lee. You are where you tweet: A content-based approach to geo-locating Twitter users. In CIKM '10, pages 759--768, 2010. Google Scholar
- J. Choi, A. Janin, and G. Friedland. The 2010 ICSI video location estimation system. In MediaEval '10 Working Notes, 2010.Google Scholar
- T. Cover and J. Thomas. Elements of Information Theory. Wiley-Interscience, 2006. Google Scholar
- D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In WWW '09, pages 761--770, 2009. Google Scholar
- C. Danesi and C. Clavel. Impact of spontaneous speech features on business concept detection: A study of call-centre data. In SSCS '10, pages 11--14, 2010. Google Scholar
- S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM '98, pages 148--155, 1998. Google Scholar
- F. Eisterlehner, A. Hotho, and R. Jäschke, editors. Proceedings of the ECML PKDD Discovery Challenge 2009, Sept. 2009.Google Scholar
- E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning. Domain-specific keyphrase extraction. In IJCAI '99, pages 668--673, 1999. Google Scholar
- J. Garofolo, G. Auzanne, and E. Voorhees. The TREC Spoken Document Retrieval Track: A success story. In RIAO '00, pages 1--20, 2000.Google Scholar
- A. Gyarmati and G. J. F. Jones. DCU at MediaEval 2010---Tagging Task Wild Wild Web. In MediaEval '10 Working Notes, 2010.Google Scholar
- J. Hays and A. Efros. IM2GPS: Estimating geographic information from a single image. In CVPR '08, pages 1--8, 2008.Google Scholar
- D. Hernández-Aranda, R. Granados, J. Cigarran, A. Rodrigo, V. Fresno, and A. García-Serrano. UNED at MediaEval 2010: Exploiting text metadata for automatic video tagging. In MediaEval '10 Working Notes, 2010.Google Scholar
- P. Heymann, D. Ramage, and H. Garcia-Molina. Social tag prediction. In SIGIR '08, pages 531--538, 2008. Google Scholar
- A. Hotho, B. Krause, D. Benz, and R. Jäschke, editors. Proceedings of the ECML PKDD Discovery Challenge 2008, Sept. 2008.Google Scholar
- M. Huijbregts, R. Ordelman, and F. de Jong. Annotation of heterogeneous multimedia content using automatic speech recognition. In Semantic Multimedia, volume 4816 of LNCS, pages 78--90. Springer, 2007. Google Scholar
- P. Kelm, S. Schmiedeke, and T. Sikora. Feature-based video key frame extraction for low quality video sequences. In WIAMIS '09, pages 25--28, 2009.Google Scholar
- P. Kelm, S. Schmiedeke, and T. Sikora. Multi-modal, multi-resource methods for placing Flickr videos on the map. In ICMR '11 (this proceedings), 2011. Google Scholar
- J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR '01, pages 111--119, 2001. Google Scholar
- L. Lamel and J.-L. Gauvain. Speech processing for audio indexing. In Advances in Natural Language Processing, volume 5221 of LNCS, pages 4--15. Springer, 2008. Google Scholar
- V. Malaisé, A. Isaac, L. Gazendam, and H. Brugman. Anchoring Dutch Cultural Heritage Thesauri to WordNet: Two Case Studies. In LaTeCH '07, pages 57--64, 2007.Google Scholar
- M. Montagnuolo and A. Messina. Parallel neural networks for multimodal video genre classification. Multimedia Tools Appl., 41(1):125--159, 2009. Google Scholar
- E. Newman and G. J. F. Jones. DCU at VideoClef 2008. In Evaluating Systems for Multilingual and Multimodal Information Access, volume 5706 of LNCS, pages 923--926. Springer, 2009. Google Scholar
- J. S. Olsson and D. W. Oard. Improving text classification for oral history archives with temporal domain knowledge. In SIGIR '07, pages 623--630, 2007. Google Scholar
- G. Paaß, E. Leopold, M. Larson, J. Kindermann, and S. Eickeler. SVM classification using sequences of phonemes and syllables. In Principles of Data Mining and Knowledge Discovery, volume 2431 of LNCS, pages 373--384. Springer, 2002. Google Scholar
- P. Pecina, P. Hoffmannová, G. J. F. Jones, Y. Zhang, and D. W. Oard. Overview of the CLEF 2007 Cross-Language Speech Retrieval Track. In Advances in Multilingual and Multimodal Information Retrieval, volume 5152 of LNCS, pages 674--686. Springer, 2008. Google Scholar
- T. Rattenbury and M. Naaman. Methods for extracting place semantics from Flickr tags. ACM Transactions on the Web, 3(1):1, 2009. Google Scholar
- S. Rendle and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In WSDM '10, pages 81--90, 2010. Google Scholar
- R. C. Rose, E. I. Chang, and R. P. Lippmann. Techniques for information retrieval from voice messages. In ICASSP '91, pages 317--320, 1991. Google Scholar
- P. Serdyukov, V. Murdock, and R. van Zwol. Placing Flickr photos on a map. In SIGIR '09, pages 484--491, 2009. Google Scholar
- B. Sigurbjörnsson and R. Van Zwol. Flickr tag recommendation based on collective knowledge. In WWW '08, pages 327--336, 2008. Google Scholar
- A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and TRECVid. In MIR '06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pages 321--330, 2006. Google Scholar
- A. Stolcke et al. Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics, 26(3):339--373, 2000. Google Scholar
- O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of Flickr resources using language models and similarity search. In ICMR '11 (this proceedings), 2011. Google Scholar
- C. Wartena. Using a divergence model for MediaEval's Tagging Task (Professional Version). In MediaEval '10 Working Notes, 2010.Google Scholar
- C. Wartena and R. Brussee. Topic detection by clustering keywords. In DEXA '08, pages 54--58, 2008. Google Scholar
- C. Wartena, R. Brussee, and W. Slakhorst. Keyword extraction using word co-occurrence. In DEXA '10, pages 54--58, 2010. Google Scholar
- C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM '01, pages 403--410, 2001. Google Scholar
Index Terms
Automatic tagging and geotagging in video collections and communities




Comments