ABSTRACT
With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a novel dataset was introduced to the computer vision and multimedia research community. To maximize the benefit for the research community and utilize its potential, this dataset has to be made accessible by tools allowing to search for target concepts within the dataset and mechanism to browse images and videos of the dataset. Following best practice from data collections, such as ImageNet and MS COCO, this paper presents means of accessibility for the YFCC100m dataset. This includes a global analysis of the dataset and an online browser to explore and investigate subsets of the dataset in real-time. Providing statistics of the queried images and videos will enable researchers to refine their query successively, such that the users desired subset of interest can be narrowed down quickly. The final set of image and video can be downloaded as URLs from the browser for further processing.
References
- J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won. The yli-med corpus: Characteristics, procedures, and plans. arXiv preprint arXiv:1503.04250, 2015.Google Scholar
- D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs. In Proc. ACM Int. Conf. on Multimedia (ACM MM), pages 223--232, October 2013. Google Scholar
Digital Library
- L. Cao, S.-F. Chang, N. Codella, C. Cotton, D. Ellis, L. Gong, M. Hill, G. Hua, J. Kender, M. Merler, Y. Mu amd A. Natsev, and J. Smith. IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System. In Proc. NIST TRECVID Workshop (unreviewed workshop paper), December 2011.Google Scholar
- J. Choi, B. Thomee, G. Friedland, L. Cao, K. Ni, D. Borth, B. Elizalde, L. Gottlieb, C. Carrano, R. Pearce, et al. The placing task: A large-scale geo-estimation challenge for social-media videos and images. In Proceedings of the 3rd ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, pages 27--31. ACM, 2014. Google Scholar
Digital Library
- J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 248--255, July 2009.Google Scholar
Cross Ref
- M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. Int. Journal of Computer Vision, 88(2):303--338, June 2010. Google Scholar
Digital Library
- M. Huiskes and M. Lew. The mir flickr retrieval evaluation. In Proc. ACM Int. Conf. Multimedia Information Retrieval (ACM MIR), October 2008. Google Scholar
Digital Library
- M. Huiskes, B. Thomee, and M. Lew. New Trends and Ideas in Visual Concept Detection: the MIR Flickr Retrieval Evaluation Initiative. In Proc. ACM Int. Conf. on Multimedia (ACM MM), pages 527--536, October 2010. Google Scholar
Digital Library
- A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Proc. Advances in Neural Information Processing Systems (NIPS), pages 1106--1114, December 2012.Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014, pages 740--755. Springer, 2014.Google Scholar
Cross Ref
- K. Ni, R. Pearce, K. Boakye, B. Van Essen, D. Borth, B. Chen, and E. Wang. Large-scale deep learning on the yfcc100m dataset. arXiv preprint arXiv:1502.03409, 2015.Google Scholar
- A. Smeaton, P. Over, and W. Kraaij. High-Level Feature Detection from Video in TRECVid: a 5-Year Retrospective of Achievements. In Multimedia Content Analysis, Theory and Applications, pages 151--174. Springer, 2009.Google Scholar
- B. Thomee, J. Moreno, and D. A Shamma. Who's time is it anyway?: Investigating the accuracy of camera timestamps. In Proc. of the ACM Int. Conf. on Multimedia (ACM MM), pages 909--912. ACM, 2014. Google Scholar
Digital Library
- B. Thomee, D. A Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817, 2015.Google Scholar
- V. Yanulevskaya, J. van Gemert, K. Roth, A. Herbold, N. Sebe, and J.M. Geusebroek. Emotional Valence Categorization using Holistic Image Features. In Proc. IEEE Int Conf on Image Processing (ICIP), pages 101--104, October 2008.Google Scholar
Cross Ref
- B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems, pages 487--495, 2014.Google Scholar
Digital Library
Index Terms
Real-time Analysis and Visualization of the YFCC100m Dataset





Comments