skip to main content
research-article

OmniArt: A Large-scale Artistic Benchmark

Authors Info & Claims
Published:23 October 2018Publication History
Skip Abstract Section

Abstract

Baselines are the starting point of any quantitative multimedia research, and benchmarks are essential for pushing those baselines further. In this article, we present baselines for the artistic domain with a new benchmark dataset featuring over 2 million images with rich structured metadata dubbed OmniArt. OmniArt contains annotations for dozens of attribute types and features semantic context information through concepts, IconClass labels, color information, and (limited) object-level bounding boxes. For our dataset we establish and present baseline scores on multiple tasks such as artist attribution, creation period estimation, type, style, and school prediction. In addition to our metadata related experiments, we explore the color spaces of art through different types and evaluate a transfer learning object recognition pipeline.

References

  1. Xavier Anguera, Luis Javier Rodríguez-Fuentes, Igor Szöke, Andi Buzo, and Florian Metze. 2014. Query by example search on speech at mediaeval 2014. In Working Notes Proceedings of the MediaEval 2014 Workshop. http://ceurws.org/Vol1263/mediaeval2014_submission_35.pdf.Google ScholarGoogle Scholar
  2. George Awad, Asad Butt, Jonathan Fiscus, David Joy, Andrew Delgado, Martial Michel, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Quénot, Maria Eskevich, Roeland Ordelman, Gareth J. F. Jones, and Benoit Huet. 2017. TRECVID 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In Proceedings of the Annual Text Retrieval Conference on Video Retrieval Evaluation (TRECVID’17). NIST.Google ScholarGoogle Scholar
  3. Yaniv Bar, Noga Levy, and Lior Wolf. 2014. Classification of artistic styles using binarized features derived from a deep neural network. In Proceedings of the Workshop at the European Conference on Computer Vision. Springer, 71--84.Google ScholarGoogle Scholar
  4. Yoann Baveye, Emmanuel Dellandrea, Christel Chamaret, and Liming Chen. 2015. Liris-accede: A video database for affective content analysis. IEEE Trans. Affect. Comput. 6, 1 (2015), 43--55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Roy S. Berns and Marissa I. Haddock. 2010. A color target for museum applications. In Proceedings of the Color and Imaging Conference, Vol. 2010. Society for Imaging Science and Technology, 27--32.Google ScholarGoogle Scholar
  6. Nicola Conci, Francesco De Natale, Vasileios Mezaris, and Mike Matton. 2015. Synchronization of multi-user event media at MediaEval 2015: Task description, datasets, and evaluation. In Proceedings of the MediaEval 2015 Workshop.Google ScholarGoogle Scholar
  7. Leendert D. Couprie. 1983. Iconclass: An iconographic classification system. Art Libr. J. 8, 2 (1983), 32--49.Google ScholarGoogle ScholarCross RefCross Ref
  8. Elliot J. Crowley and Andrew Zisserman. 2014. In search of art. In Proceedings of the Workshop at the European Conference on Computer Vision. Springer, 54--70.Google ScholarGoogle Scholar
  9. Claire-Hélène Demarty, Cédric Penet, Mohammad Soleymani, and Guillaume Gravier. 2015. VSD, a public dataset for the detection of violent scenes in movies: Design, annotation, analysis and evaluation. Multimedia Tools Appl. 74, 17 (2015), 7379--7404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on, Computer Vision and Pattern Recognition (CVPR’09). IEEE, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Elgammal, Y. Kang, and M. Den Leeuw. 2017. Picasso, matisse, or a fake? Automated analysis of drawings at the stroke level for attribution and authentication. ArXiv e-prints (Nov. 2017). arxiv:1711.03536Google ScholarGoogle Scholar
  12. A. Elgammal, M. Mazzone, B. Liu, D. Kim, and M. Elhoseiny. 2018. The shape of art history in the eyes of the machine. ArXiv e-prints (Jan. 2018). arxiv:cs.AI/1801.07729Google ScholarGoogle Scholar
  13. Ahmed Elgammal and Babak Saleh. 2015. Quantifying creativity in art networks. arXiv preprint arXiv:1506.00711 (2015).Google ScholarGoogle Scholar
  14. Hugo Jair Escalante, Víctor Ponce-López, Jun Wan, Michael A Riegler, Baiyu Chen, Albert Clapés, Sergio Escalera, Isabelle Guyon, Xavier Baró, Pål Halvorsen, et al. 2016. Chalearn joint contest on multimedia challenges beyond visual analysis: An overview. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, 67--73.Google ScholarGoogle ScholarCross RefCross Ref
  15. Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K.I. Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 111, 1 (2015), 98--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Li Fei-Fei, Rob Fergus, and Pietro Perona. 2006. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 4 (2006), 594--611. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.Google ScholarGoogle ScholarCross RefCross Ref
  18. Shiry Ginosar, Daniel Haas, Timothy Brown, and Jitendra Malik. 2014. Detecting people in cubist art. In Proceedings of the Workshop at the European Conference on Computer Vision. Springer, 101--116.Google ScholarGoogle Scholar
  19. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 object category dataset. (2007).Google ScholarGoogle Scholar
  21. Hui Mao, Ming Cheung, and James She. 2017. DeepArt: Learning joint representations of visual arts. In Proceedings of the 2017 ACM on Multimedia Conference. ACM, 1183–1191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Dmitry I. Ignatov and Sergei O. Kuznetsov. 2009. Frequent itemset mining for clustering near duplicate web documents. In Proceedings of the International Conference on Conceptual Structures. Springer, 185--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bogdan Ionescu, Alexandru Lucian Gînscă, Bogdan Boteanu, Mihai Lupu, Adrian Popescu, and Henning Müller. 2016. Div150Multi: A social image retrieval result diversification dataset with multi-topic queries. In Proceedings of the 7th International Conference on Multimedia Systems. ACM, 46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Richard Johnson, Ella Hendriks, Igor J. Berezhnoy, Eugene Brevdo, Shannon M. Hughes, Ingrid Daubechies, Jia Li, Eric Postma, and James Z. Wang. 2008. Image processing for artist identification. IEEE Sign. Process. Mag. 25, 4 (2008).Google ScholarGoogle ScholarCross RefCross Ref
  25. Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross Girshick. 2017. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, 1988--1997.Google ScholarGoogle Scholar
  26. Sergey Karayev, Matthew Trentacoste, Helen Han, Aseem Agarwala, Trevor Darrell, Aaron Hertzmann, and Holger Winnemoeller. 2013. Recognizing image style. arXiv preprint arXiv:1311.3715 (2013).Google ScholarGoogle Scholar
  27. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  28. Alex Krizhevsky and Geoffrey E. Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report 4. Citeseer.Google ScholarGoogle Scholar
  29. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Martha Larson, Mohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, and Gareth J. F. Jones. 2017. The benchmarking initiative for multimedia evaluation: MediaEval 2016. IEEE MultiMedia 24, 1 (2017), 93--96.Google ScholarGoogle ScholarCross RefCross Ref
  31. Adrian Lecoutre, Benjamin Negrevergne, and Florian Yger. 2017. Recognizing art style automatically in painting with deep learning. In Proceedings of the 9th Asian Conference on Machine Learning (Proceedings of Machine Learning Research), Min-Ling Zhang and Yung-Kyun Noh (Eds.), Vol. 77. PMLR, 327--342.Google ScholarGoogle Scholar
  32. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  33. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740--755.Google ScholarGoogle Scholar
  34. Thomas Mensink and Jan Van Gemert. 2014. The rijksmuseum challenge: Museum-centered visual recognition. In Proceedings of International Conference on Multimedia Retrieval. ACM, 451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016).Google ScholarGoogle Scholar
  36. Jean C. Rush. 1979. Acquiring a concept of painting style. Stud. Art Educ. 20, 3 (1979), 43--51.Google ScholarGoogle ScholarCross RefCross Ref
  37. Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 1 (2008), 157--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Babak Saleh and Ahmed Elgammal. 2015. Large-scale classification of fine-art paintings: Learning the right metric on the right feature. arXiv preprint arXiv:1505.00855 (2015).Google ScholarGoogle Scholar
  39. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).Google ScholarGoogle Scholar
  40. Alan F. Smeaton, Paul Over, Cash Costello, Arjen P. de Vries, David S. Doermann, Alexander G. Hauptmann, Mark E. Rorvig, John R. Smith, and Lide Wu. 2002. The TREC2001 video track: Information retrieval on digital video information. In ECDL (Lecture Notes in Computer Science), Vol. 2458. Springer, 266–275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Gjorgji Strezoski and Marcel Worring. 2017. OmniArt: Multi-task deep learning for artistic data analysis. arXiv preprint arXiv:1708.00684 (2017).Google ScholarGoogle Scholar
  42. Gjorgji Strezoski and Marcel Worring. 2017. Plug-and-play interactive deep network visualization. In VADL: Visual Analytics for Deep Learning. 0100--0106.Google ScholarGoogle Scholar
  43. Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (2016), 64--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Nanne van Noord, Ella Hendriks, and Eric Postma. 2015. Toward discovery of the artist’s style: Learning to recognize artists by their artworks. IEEE Sign. Process. Mag. 32, 4 (2015), 46--54.Google ScholarGoogle ScholarCross RefCross Ref
  45. Nicholas Westlake, Hongping Cai, and Peter Hall. 2016. Detecting people in artwork with CNNs. In Proceedings of the European Conference on Computer Vision. Springer, 825--841.Google ScholarGoogle ScholarCross RefCross Ref
  46. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networkss. arXiv preprint arXiv:1703.10593 (2017).Google ScholarGoogle Scholar

Index Terms

  1. OmniArt: A Large-scale Artistic Benchmark

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Multimedia Computing, Communications, and Applications
              ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 14, Issue 4
              Special Section on Deep Learning for Intelligent Multimedia Analytics
              November 2018
              221 pages
              ISSN:1551-6857
              EISSN:1551-6865
              DOI:10.1145/3282485
              Issue’s Table of Contents

              Copyright © 2018 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 23 October 2018
              • Accepted: 1 August 2018
              • Revised: 1 July 2018
              • Received: 1 February 2018
              Published in tomm Volume 14, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!