10.1145/2814815.2816986acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedings
research-article

Kickstarting the Commons: The YFCC100M and the YLI Corpora

Authors Info & Claims
Published:30 October 2015

ABSTRACT

The publication of the Yahoo Flickr Creative Commons 100 Million dataset (YFCC100M)--to date the largest open-access collection of photos and videos--has provided a unique opportunity to stimulate new research in multimedia analysis and retrieval. To make the YFCC100M even more valuable, we have started working towards supplementing it with a comprehensive set of precomputed features and high-quality ground truth annotations. As part of our efforts, we are releasing the YLI feature corpus, as well as the YLI-GEO and YLI-MED annotation subsets. Under the Multimedia Commons Project (MMCP), we are currently laying the groundwork for a common platform and framework around the YFCC100M that (i) facilitates researchers in contributing additional features and annotations, (ii) supports experimentation on the dataset, and (iii) enables sharing of obtained results. This paper describes the YLI features and annotations released thus far, and sketches our vision for the MMCP.

References

  1. K. Ashraf, B. Elizalde, F. Iandola, M. Moskewicz, G. Friedland, K. Keutzer, and J. Bernd. Audio-based multimedia event detection with DNNs and sparse sampling. In Proceedings of the 5th ACM International Conference on Multimedia Retrieval (ICMR '15), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won. The YLI-MED corpus: Characteristics, procedures, and plans (ICSI Technical Report TR-15-001). arXiv:1503.04250, 2015.Google ScholarGoogle Scholar
  3. J. Choi, B. Thomee, G. Friedland, L. Cao, K. Ni, D. Borth, B. Elizalde, L. Gottlieb, C. Carrano, R. Pearce, and D. Poland. The Placing Task: A large-scale geo-estimation challenge for social-media videos and images. In Proceedings of the ACM Multimedia 2014 Workshop on Geotagging and Its Applications in Multimedia (GeoMM '14), Orlando, FL, November 2014. Association for Computing Machinery. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Donahue. Caffenet model from modelzoo. htps://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet, 2012.Google ScholarGoogle Scholar
  5. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Google ScholarGoogle Scholar
  6. S. Kalkowski, D. Borth, C. Schulze, and A. Dengel. Real-time analysis and visualization of the YFCC100M dataset. In Proceedings of the ACM Multimedia 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions (MMCommons '15), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. LEAR. Lear's GIST implementation. http://lear.inrialpes.fr/software.Google ScholarGoogle Scholar
  9. J. Liu, H. Cheng, O. Javed, Q. Yu, I. Chakraborty, W. Zhang, A. Divakaran, H. S. Sawhney, J. Allan, R. Manmatha, J. Foley, M. Shah, A. Dehghan, M. Witbrock, J. Curtis, and G. Friedland. SRI-Sarnoff AURORA system at TRECVID 2013: Multimedia event detection and recounting. In TREC Video Retrieval Evaluation: Notebook Papers and Slides, 2013.Google ScholarGoogle Scholar
  10. M. Lux and O. Marques. Visual Information Retrieval using Java and LIRE. Morgan & Claypool, San Rafael, CA, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  11. K. S. Ni, C. C. Carrano, D. N. Poland, B. M. Elizalde, G. Friedland, L. R. Gottlieb, and D. S. Borth. The Yahoo-Livermore-ICSI (YLI) multimedia feature set. Technical Report LLNL-MI-659231, Lawrence Livermore National Laboratories, August 2014.Google ScholarGoogle Scholar
  12. P. Over, G. Awad, J. Fiscus, B. Antonishek, M. Michel, A. Smeaton, W. Kraaij, and G. Quénot. TRECVID 2011 - an overview of the goals, tasks, data, evaluation mechanisms, and metrics. Technical report, National Institute of Standards and Technology, Gaithersburg, MD, May 2012.Google ScholarGoogle Scholar
  13. A. Popescu, E. Spyromitros-Xioufis, S. Papadopoulos, H. L. Borgne, and Y. Kompatsiaris. Toward an automatic evaluation of retrieval performance with large scale image collections. In Proceedings of the ACM Multimedia 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions (MMCommons '15), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Pulli, A. Baksheev, K. Kornyakov, and V. Eruhimov. Real-time computer vision with OpenCV. Communications of the ACM, 55(6):61--69, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Strassel, A. Morris, J. Fiscus, C. Caruso, H. Lee, P. Over, J. Fiumara, B. Shaw, B. Antonishek, and M. Michel. Creating HAVIC: Heterogeneous audio visual Internet collection. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC '12), Istanbul, Turkey, May 2012. European Language Resources Association (ELRA).Google ScholarGoogle Scholar
  16. B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li. YFCC100M: The new data in multimedia research. Communications of the ACM, 2015. To appear.Google ScholarGoogle Scholar
  17. H. Wang and C. Schmid. Action Recognition with Improved Trajectories. In ICCV 2013 - IEEE International Conference on Computer Vision, pages 3551--3558, Sydney, Australia, Dec. 2013. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Kickstarting the Commons: The YFCC100M and the YLI Corpora

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          ACM Conferences cover image
          MMCommons '15: Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions
          October 2015
          50 pages
          ISBN:9781450337441
          DOI:10.1145/2814815

          Copyright © 2015 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 October 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Qualifiers

          • research-article

          Upcoming Conference

          MM '22

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!