ABSTRACT

The publication of the Yahoo Flickr Creative Commons 100 Million dataset (YFCC100M)--to date the largest open-access collection of photos and videos--has provided a unique opportunity to stimulate new research in multimedia analysis and retrieval. To make the YFCC100M even more valuable, we have started working towards supplementing it with a comprehensive set of precomputed features and high-quality ground truth annotations. As part of our efforts, we are releasing the YLI feature corpus, as well as the YLI-GEO and YLI-MED annotation subsets. Under the Multimedia Commons Project (MMCP), we are currently laying the groundwork for a common platform and framework around the YFCC100M that (i) facilitates researchers in contributing additional features and annotations, (ii) supports experimentation on the dataset, and (iii) enables sharing of obtained results. This paper describes the YLI features and annotations released thus far, and sketches our vision for the MMCP.
References
- K. Ashraf, B. Elizalde, F. Iandola, M. Moskewicz, G. Friedland, K. Keutzer, and J. Bernd. Audio-based multimedia event detection with DNNs and sparse sampling. In Proceedings of the 5th ACM International Conference on Multimedia Retrieval (ICMR '15), 2015. Google Scholar
Digital Library
- J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won. The YLI-MED corpus: Characteristics, procedures, and plans (ICSI Technical Report TR-15-001). arXiv:1503.04250, 2015.Google Scholar
- J. Choi, B. Thomee, G. Friedland, L. Cao, K. Ni, D. Borth, B. Elizalde, L. Gottlieb, C. Carrano, R. Pearce, and D. Poland. The Placing Task: A large-scale geo-estimation challenge for social-media videos and images. In Proceedings of the ACM Multimedia 2014 Workshop on Geotagging and Its Applications in Multimedia (GeoMM '14), Orlando, FL, November 2014. Association for Computing Machinery. Google Scholar
Digital Library
- J. Donahue. Caffenet model from modelzoo. htps://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet, 2012.Google Scholar
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Google Scholar
- S. Kalkowski, D. Borth, C. Schulze, and A. Dengel. Real-time analysis and visualization of the YFCC100M dataset. In Proceedings of the ACM Multimedia 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions (MMCommons '15), 2015. Google Scholar
Digital Library
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.Google Scholar
Digital Library
- LEAR. Lear's GIST implementation. http://lear.inrialpes.fr/software.Google Scholar
- J. Liu, H. Cheng, O. Javed, Q. Yu, I. Chakraborty, W. Zhang, A. Divakaran, H. S. Sawhney, J. Allan, R. Manmatha, J. Foley, M. Shah, A. Dehghan, M. Witbrock, J. Curtis, and G. Friedland. SRI-Sarnoff AURORA system at TRECVID 2013: Multimedia event detection and recounting. In TREC Video Retrieval Evaluation: Notebook Papers and Slides, 2013.Google Scholar
- M. Lux and O. Marques. Visual Information Retrieval using Java and LIRE. Morgan & Claypool, San Rafael, CA, 2013.Google Scholar
Cross Ref
- K. S. Ni, C. C. Carrano, D. N. Poland, B. M. Elizalde, G. Friedland, L. R. Gottlieb, and D. S. Borth. The Yahoo-Livermore-ICSI (YLI) multimedia feature set. Technical Report LLNL-MI-659231, Lawrence Livermore National Laboratories, August 2014.Google Scholar
- P. Over, G. Awad, J. Fiscus, B. Antonishek, M. Michel, A. Smeaton, W. Kraaij, and G. Quénot. TRECVID 2011 - an overview of the goals, tasks, data, evaluation mechanisms, and metrics. Technical report, National Institute of Standards and Technology, Gaithersburg, MD, May 2012.Google Scholar
- A. Popescu, E. Spyromitros-Xioufis, S. Papadopoulos, H. L. Borgne, and Y. Kompatsiaris. Toward an automatic evaluation of retrieval performance with large scale image collections. In Proceedings of the ACM Multimedia 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions (MMCommons '15), 2015. Google Scholar
Digital Library
- K. Pulli, A. Baksheev, K. Kornyakov, and V. Eruhimov. Real-time computer vision with OpenCV. Communications of the ACM, 55(6):61--69, 2012. Google Scholar
Digital Library
- S. Strassel, A. Morris, J. Fiscus, C. Caruso, H. Lee, P. Over, J. Fiumara, B. Shaw, B. Antonishek, and M. Michel. Creating HAVIC: Heterogeneous audio visual Internet collection. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC '12), Istanbul, Turkey, May 2012. European Language Resources Association (ELRA).Google Scholar
- B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li. YFCC100M: The new data in multimedia research. Communications of the ACM, 2015. To appear.Google Scholar
- H. Wang and C. Schmid. Action Recognition with Improved Trajectories. In ICCV 2013 - IEEE International Conference on Computer Vision, pages 3551--3558, Sydney, Australia, Dec. 2013. IEEE. Google Scholar
Digital Library
Index Terms
Kickstarting the Commons: The YFCC100M and the YLI Corpora





Comments