skip to main content
research-article

Learning Discriminative Sentiment Representation from Strongly- and Weakly Supervised CNNs

Published:07 December 2019Publication History
Skip Abstract Section

Abstract

Visual sentiment analysis is attracting increasing attention with the rapidly growing amount of images uploaded to social networks. Learning rich visual representations often requires training deep convolutional neural networks (CNNs) on massive manually labeled data, which is expensive or scarce especially for a subjective task like visual sentiment analysis. Meanwhile, a large quantity of social images is quite available yet noisy by querying social networks using the sentiment categories as keywords, where various types of images related to the specific sentiment can be easily collected. In this article, we propose a multiple kernel network for visual sentiment recognition, which learns representation from strongly- and weakly supervised CNNs. Specifically, the weakly supervised deep model is trained using the large-scale data from social images, whereas the strongly supervised deep model is fine tuned on the affecitve datasets with manual annotation. We employ the multiple kernel scheme on the multiple layers of CNNs, which can automatically select the discriminative representation by learning a linear combination from a set of pre-defined kernels. In addition, we introduce a large-scale dataset collected from popular comics of various countries, such as America, Japan, China, and France, which consists of 11,821 images with various artistic styles. Experimental results show that the multiple kernel network achieves consistent improvements over the state-of-the-art methods on the public affective datasets, as well as the newly established Comics dataset. The Comics dataset can be found at http://cv.nankai.edu.cn/projects/Comic.

References

  1. Unaiza Ahsan, Munmun De Choudhury, and Irfan A. Essa. 2017. Towards using visual attributes to infer image sentiment of social events. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN’17).Google ScholarGoogle Scholar
  2. Soraia M. Alarcao and Manuel J. Fonseca. 2018. Identifying emotions in images from valence and arousal ratings. Multimedia Tools and Applications 77, 13 (2018), 17413--17435.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Victor Campos, Brendan Jou, and Xavier Giró i Nieto. 2017. From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction. Image and Vision Computing 65 (2017), 15--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Víctor Campos, Amaia Salvador, Xavier Giro-i Nieto, and Brendan Jou. 2015. Diving deep into sentiment: Understanding fine-tuned CNNs for visual sentiment prediction. In Proceedings of the International Workshop on Affect and Sentiment in Multimedia.Google ScholarGoogle Scholar
  6. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3 (2011), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ming Chen, Lu Zhang, and Jan P. Allebach. 2015. Learning deep features for image emotion classification. In Proceedings of the IEEE International Conference on Image Processing.Google ScholarGoogle Scholar
  8. Tao Chen, Damian Borth, Trevor Darrell, and Shih-Fu Chang. 2014. DeepSentiBank: Visual sentiment concept classification with deep convolutional neural networks. arXiv:1410.8586.Google ScholarGoogle Scholar
  9. Tao Chen, Felix X. Yu, Jiawei Chen, Yin Cui, Yan Ying Chen, and Shih Fu Chang. 2014. Object-based visual sentiment concept analysis and application. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yan-Ying Chen, Tao Chen, Taikun Liu, Hong-Yuan Mark Liao, and Shih-Fu Chang. 2015. Assistive image comment robot—A novel mid-level concept-based representation. IEEE Transactions on Affective Computing 6, 3 (2015), 298--311.Google ScholarGoogle ScholarCross RefCross Ref
  11. Youngmin Cho and Lawrence K Saul. 2009. Kernel methods for deep learning. In Proceedings of the Annual Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  12. Paul Ekman. 1992. An argument for basic emotions. Cognition 8 Emotion 6, 3–4 (1992), 169--200.Google ScholarGoogle Scholar
  13. Paul Ekman, Wallace V. Friesen, Maureen O’Sullivan, Anthony Chan, Irene Diacoyanni-Tarlatzis, Karl Heider, Rainer Krause, et al. 1987. Universals and cultural differences in the judgments of facial expressions of emotion. Journal of Personality and Social Psychology 53, 4 (1987), 712.Google ScholarGoogle ScholarCross RefCross Ref
  14. Shaojing Fan, Zhiqi Shen, Ming Jiang, Bryan L. Koenig, Juan Xu, Mohan S. Kankanhalli, and Qi Zhao. 2018. Emotional attention: A study of image sentiment and visual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  15. Yang Gao, Oscar Beijbom, Ning Zhang, and Trevor Darrell. 2015. Compact bilinear pooling. arXiv:1511.06062.Google ScholarGoogle Scholar
  16. Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  17. Alan Hanjalic. 2010. Extracting moods from pictures and sounds. IEEE Signal Processing Magazine 23, 2 (2010), 90--100.Google ScholarGoogle ScholarCross RefCross Ref
  18. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  19. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385.Google ScholarGoogle Scholar
  20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  21. Xuanyu He and Wei Zhang. 2018. Emotion recognition by assisted learning with convolutional neural networks. Neurocomputing 291 (2018), 187--194.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jia Jia, Sen Wu, Xiaohui Wang, Peiyun Hu, Lianhong Cai, and Jie Tang. 2012. Can we understand Van Gogh’s mood? Learning to infer affects from images in social networks. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mingyuan Jiu and Hichem Sahbi. 2015. Semi supervised deep kernel design for image annotation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.Google ScholarGoogle ScholarCross RefCross Ref
  25. Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Quang Tuan Luong, James Z. Wang, Jia Li, and Jiebo Luo. 2011. Aesthetics and emotions in images. IEEE Signal Processing Magazine 28, 5 (2011), 94--115.Google ScholarGoogle ScholarCross RefCross Ref
  26. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xin Lu, Poonam Suryanarayan, Reginald B. Adams, Jia Li, Michelle G. Newman, and James Z. Wang. 2012. On shape and the computability of emotions. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle Scholar
  28. Xin Lu, Poonam Suryanarayan, Reginald B. Adams Jr., Jia Li, Michelle G. Newman, and James Z. Wang. 2012. On shape and the computability of emotions. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle Scholar
  29. Jana Machajdik and Allan Hanbury. 2010. Affective image classification using features inspired by psychology and art theory. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Julien Mairal, Piotr Koniusz, Zaïd Harchaoui, and Cordelia Schmid. 2014. Convolutional kernel networks. In Proceedings of the Annual Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  31. Scott McCloud. 2007. Making comics: Storytelling secrets of comics, manga and graphic novels. Journal of Popular Culture 40, 5 (2007), 890--892.Google ScholarGoogle ScholarCross RefCross Ref
  32. Joseph A. Mikels, Barbara L. Fredrickson, Gregory R. Larkin, Casey M. Lindberg, Sam J. Maglio, and Patricia A. Reuter-Lorenz. 2005. Emotional category data on images from the international affective picture system. Behavior Research Methods 37, 4 (2005), 626--630.Google ScholarGoogle ScholarCross RefCross Ref
  33. Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42, 3 (2001), 145--175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rameswar Panda, Jianming Zhang, Haoxiang Li, Joon-Young Lee, Xin Lu, and Amit K. Roy-Chowdhury. 2018. Contemplating visual emotions: Understanding and overcoming dataset bias. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle Scholar
  35. Kuan-Chuan Peng and Tsuhan Chen. 2015. Cross-layer features in convolutional neural networks for generic classification tasks. In Proceedings of the IEEE International Conference on Image Processing.Google ScholarGoogle ScholarCross RefCross Ref
  36. Kuan-Chuan Peng and Tsuhan Chen. 2015. A framework of extracting multi-scale features using multiple convolutional neural networks. In Proceedings of the IEEE International Conference on Multimedia and Expo.Google ScholarGoogle ScholarCross RefCross Ref
  37. Kuan-Chuan Peng, Tsuhan Chen, Amir Sadovnik, and Andrew C. Gallagher. 2015. A mixed bag of emotions: Model, predict, and transfer emotion distributions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  38. Soujanya Poria, Haiyun Peng, Amir Hussain, Newton Howard, and Erik Cambria. 2017. Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing 261 (2017), 217--230.Google ScholarGoogle ScholarCross RefCross Ref
  39. Alain Rakotomamonjy, Francis Bach, Stéphane Canu, and Yves Grandvalet. 2007. More efficiency in multiple kernel learning. In Proceedings of the International Conference on Machine Learning.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tianrong Rao, Min Xu, and Dong Xu. 2016. Learning multi-level deep representations for image emotion classification. arXiv:1611.07145.Google ScholarGoogle Scholar
  41. Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analalysis and Machine Intelligence 39, 6 (2017), 1137--1149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Andreza Sartori, Dubravko Culibrk, Yan Yan, and Nicu Sebe. 2015. Who’s afraid of Itten: Using the art theory of color combination to analyze emotions in abstract paintings. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Evan Shelhamer, Jonathan Long, and Trevor Darrell. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 640--651.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  45. Kaikai Song, Ting Yao, Qiang Ling, and Tao Mei. 2018. Boosting image sentiment analysis with visual attention. Neurocomputing 312 (2018), 218--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer, and Bernhard Schölkopf. 2006. Large scale multiple kernel learning. Journal of Machine Learning Research 7, 7 (2006), 1531--1565.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ming Sun, Jufeng Yang, Kai Wang, and Hui Shen. 2016. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction. In Proceedings of the IEEE International Conference on Multimedia and Expo.Google ScholarGoogle ScholarCross RefCross Ref
  48. Vladyslav Sydorov, Mayu Sakurada, and Christoph H. Lampert. 2014. Deep Fisher kernels—End to end learning of the Fisher kernel GMM parameters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  49. Quoc-Tuan Truong and Hady W. Lauw. 2017. Visual sentiment analysis for review images with item-oriented and user-oriented CNN. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle Scholar
  50. Wang Wei-Ning, Yu Ying-Lin, and Jiang Sheng-Ming. 2006. Image retrieval by emotional semantics: A study of emotional space and feature extraction. In Proceedings of the 2006 IEEE International Conference on Systems, Man, and Cybernetics (SMC’06).Google ScholarGoogle ScholarCross RefCross Ref
  51. Lifang Wu, Shuang Liu, Meng Jian, Jiebo Luo, Xiuzhen Zhang, and Mingchao Qi. 2017. Reducing noisy labels in weakly labeled data for visual sentiment analysis. In Proceedings of the IEEE International Conference on Image Processing.Google ScholarGoogle ScholarCross RefCross Ref
  52. Jufeng Yang, Dongyu She, Yu-Kun Lai, Paul L. Rosin, and Ming-Hsuan Yang. 2018. Weakly supervised coupled networks for visual sentiment analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  53. Jufeng Yang, Dongyu She, Yu-Kun Lai, and Ming-Hsuan Yang. 2018. Retrieving and classifying affective images via deep metric learning. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  54. Jufeng Yang, Dongyu She, and Ming Sun. 2017. Joint image emotion classification and distribution learning via deep convolutional neural network. In Proceedings of the International Joint Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  55. Jufeng Yang, Dongyu She, Ming Sun, Ming Ming Cheng, Paul Rosin, and Liang Wang. 2018. Visual sentiment prediction based on automatic discovery of affective regions. IEEE Transactions on Multimedia 20, 9 (2018), 2513--2525.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jufeng Yang, Ming Sun, and Xiaoxiao Sun. 2017. Learning visual sentiment distribution via augmented conditional probability neural network. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  57. Quanzeng You, Hailin Jin, and Jiebo Luo. 2017. Visual sentiment analysis by attending on local image regions. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  58. Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  59. Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  60. Jianbo Yuan, Sean McDonough, Quanzeng You, and Jiebo Luo. 2013. Sentribute: Image sentiment analysis from a mid-level perspective. In Proceedings of the International Workshop on Issues of Sentiment Discovery and Opinion Mining.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle Scholar
  62. Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdiscipplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018).Google ScholarGoogle Scholar
  63. Xiaopeng Zhang, Hongkai Xiong, Wengang Zhou, Weiyao Lin, and Qi Tian. 2016. Picking deep filter responses for fine-grained image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  64. Sicheng Zhao, Guiguang Ding, Yue Gao, and Jungong Han. 2017. Approximating discrete probability distribution of image emotions by multi-modal features fusion. In Proceedings of the International Joint Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  65. Sicheng Zhao, Guiguang Ding, Yue Gao, and Jungong Han. 2017. Learning visual emotion distributions via multi-modal features fusion. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Sicheng Zhao, Guiguang Ding, Qingming Huang, Tat-Seng Chua, Björn W. Schuller, and Kurt Keutzer. 2018. Affective image content analysis: A comprehensive survey. In Proceedings of the International Joint Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  67. Sicheng Zhao, Yue Gao, Guiguang Ding, and Tat Seng Chua. 2017. Real-time multimedia social event detection in microblog. IEEE Transactions on Cybernetics 48, 11 (2017), 3218--3231.Google ScholarGoogle ScholarCross RefCross Ref
  68. Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat Seng Chua, and Xiaoshuai Sun. 2014. Exploring principles-of-art features for image emotion recognition. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, and Guiguang Ding. 2017. Continuous probability distribution prediction of image emotions via multi-task shared sparse regression. IEEE Transactions on Multimedia 19, 3 (2017), 632--645.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Sicheng Zhao, Hongxun Yao, You Yang, and Yanhao Zhang. 2014. Affective image retrieval via multi-graph learning. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Sicheng Zhao, Xin Zhao, Guiguang Ding, and Kurt Keutzer. 2018. EmotionGAN: Unsupervised domain adaptation for learning discrete probability distributions of image emotions. In Proceedings of the ACM International Conference on Multimedia. 1319--1327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Honglin Zheng, Tianlang Chen, Quanzeng You, and Jiebo Luo. 2017. When saliency meets sentiment: Understanding how image content invokes emotion and sentiment. In Proceedings of the IEEE International Conference on Image Processing.Google ScholarGoogle ScholarCross RefCross Ref
  73. Xinge Zhu, Liang Li, Weigang Zhang, Tianrong Rao, Min Xu, Qingming Huang, and Dong Xu. 2017. Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition. In Proceedings of the International Joint Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  74. Jinfeng Zhuang, Ivor W. Tsang, and Steven C. H. Hoi. 2011. Two-layer multiple kernel learning. Journal of Machine Learning Research 15 (2011), 909--917.Google ScholarGoogle Scholar

Index Terms

  1. Learning Discriminative Sentiment Representation from Strongly- and Weakly Supervised CNNs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 3s
            Special Issue on Face Analysis for Applications and Special Issue on Affective Computing for Large-Scale Heterogeneous Multimedia Data
            November 2019
            304 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3368027
            Issue’s Table of Contents

            Copyright © 2019 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 December 2019
            • Accepted: 1 April 2019
            • Revised: 1 March 2019
            • Received: 1 December 2018
            Published in tomm Volume 15, Issue 3s

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!