skip to main content
research-article

Dual-path Convolutional Image-Text Embeddings with Instance Loss

Authors Info & Claims
Published:22 May 2020Publication History
Skip Abstract Section

Abstract

Matching images and sentences demands a fine understanding of both modalities. In this article, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image/text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss on heterogeneous features (i.e., text and image features) is less effective, because it is hard to find appropriate triplets at the beginning. So the naive way of using the ranking loss may compromise the network from learning inter-modal relationship. To address this problem, we propose the instance loss, which explicitly considers the intra-modal data distribution. It is based on an unsupervised assumption that each image/text group can be viewed as a class. So the network can learn the fine granularity from every image/text group. The experiment shows that the instance loss offers better weight initialization for the ranking loss, so that more discriminative embeddings can be learned. Besides, existing works usually apply the off-the-shelf features, i.e., word2vec and fixed visual feature. So in a minor contribution, this article constructs an end-to-end dual-path convolutional network to learn the image and text representations. End-to-end learning allows the system to directly learn from the data and fully utilize the supervision. On two generic retrieval datasets (Flickr30k and MSCOCO), experiments demonstrate that our method yields competitive accuracy compared to state-of-the-art methods. Moreover, in language-based person retrieval, we improve the state of the art by a large margin. The code has been made publicly available.

References

  1. Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Learning aligned cross-modal representations from weakly aligned data. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  2. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2016. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915.Google ScholarGoogle Scholar
  3. Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. 2015. Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  4. Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. Paying more attention to saliency: Image captioning with saliency and context attention. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2 (2018), 48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cheng Deng, Xu Tang, Junchi Yan, Wei Liu, and Xinbo Gao. 2015. Discriminative dictionary learning with common label alignment for cross-modal retrieval. IEEE Transactions on Multimedia 18, 2 (2015), 208--218.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cheng Deng, Erkun Yang, Tongliang Liu, Wei Liu, Jie Li, and Dacheng Tao. 2019. Unsupervised semantic-preserving adversarial hashing for image search. IEEE Trans. Image Process. 28, 8 (2019), 4032--4044.Google ScholarGoogle ScholarCross RefCross Ref
  7. Guiguang Ding, Yuchen Guo, Jile Zhou, and Yue Gao. 2016. Large-scale cross-modality search via collective matrix factorization hashing. IEEE Transactions on Image Processing 25, 11 (2016), 5427--5440.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  9. Aviv Eisenschtat and Lior Wolf. 2017. Linking image and text with 2-way nets. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  10. Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE++: Improved visual-semantic embeddings. In Proceeding of BMVC (2018).Google ScholarGoogle Scholar
  11. Hehe Fan, Liang Zheng, Chenggang Yan, and Yi Yang. 2018. Unsupervised person re-identification: Clustering and fine-tuning. ACM Trans. Multimedia Comput. Commun. Appl. 14, 4 (2018). DOI:https://doi.org/10.1145/3243316Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fangxiang Feng, Xiaojie Wang, Ruifan Li, and Ibrar Ahmad. 2015. Correspondence autoencoders for cross-modal retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 12, 1s (2015), 26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, et al. 2013. Devise: A deep visual-semantic embedding model. In Proceedings of the NIPS.Google ScholarGoogle Scholar
  14. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of the ICML.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the AISTAT.Google ScholarGoogle Scholar
  16. Douglas Gray, Shane Brennan, and Hai Tao. 2007. Evaluating appearance models for recognition, reacquisition, and tracking. In Proceedings of the PETS.Google ScholarGoogle Scholar
  17. Jiuxiang Gu, Jianfei Cai, Shafiq R. Joty, Li Niu, and Gang Wang. 2018. Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  18. David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 16, 12 (2004), 2639–2664.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  20. Ran He, Man Zhang, Liang Wang, Ye Ji, and Qiyue Yin. 2015. Cross-modal subspace learning via pairwise constraints. IEEE Transactions on Image Processing 24, 12 (2015), 5543–5556.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xinwei He, Baoguang Shi, Xiang Bai, Gui-Song Xia, Zhaoxiang Zhang, and Weisheng Dong. 2019. Image caption generation with part of speech guidance. Pattern Recogn. Lett. 119 (2019), 229–237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yonghao He, Shiming Xiang, Cuicui Kang, Jian Wang, and Chunhong Pan. 2016. Cross-modal retrieval via deep and bidirectional representation learning. IEEE Trans. Multimedia 18, 7 (2016), 1363--1377.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Micah Hodosh, Peter Young, and Julia Hockenmaier. 2013. Framing image description as a ranking task: Data, models and evaluation metrics. J. Artific. Intell. Res. 47 (2013), 853--899.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Proceedings of the NIPS.Google ScholarGoogle Scholar
  26. Yuting Hu, Liang Zheng, Yi Yang, and Yongfeng Huang. 2018. Twitter100k: A real-world dataset for weakly supervised cross-media retrieval. IEEE Transactions on Multimedia 20, 4 (2017), 927–938.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the ACL.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yan Huang, Wei Wang, and Liang Wang. 2017. Instance-aware image and sentence matching with selective multimodal LSTM. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  29. Yan Huang, Qi Wu, Chunfeng Song, and Liang Wang. 2018. Learning semantic concepts and order for image and sentence matching. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  30. Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  31. Andrej Karpathy, Armand Joulin, and Fei Fei F. Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In Proceedings of the NIPS.Google ScholarGoogle Scholar
  32. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the EMNLP.Google ScholarGoogle ScholarCross RefCross Ref
  33. Benjamin Klein, Guy Lev, Gil Sadeh, and Lior Wolf. 2015. Associating neural word embeddings with deep image representations using fisher vectors. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  34. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the NIPS.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  36. Guy Lev, Gil Sadeh, Benjamin Klein, and Lior Wolf. 2016. RNN fisher vectors for action recognition and image annotation. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  37. Kai Li, Guo-Jun Qi, and Kien A. Hua. 2017. Learning label preserving binary codes for multimedia retrieval: A general approach. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (2017), 2.Google ScholarGoogle Scholar
  38. Shuang Li, Tong Xiao, Hongsheng Li, Wei Yang, and Xiaogang Wang. 2017. Identity-aware textual-visual matching with latent co-attention. In Proceedings of the ICCV.Google ScholarGoogle ScholarCross RefCross Ref
  39. Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. 2017. Person search with natural language description. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  40. Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the CVPR.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the ECCV.Google ScholarGoogle Scholar
  42. Xiao Lin and Devi Parikh. 2016. Leveraging visual question answering for image-caption ranking. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  43. Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, Zhilan Hu, Chenggang Yan, and Yi Yang. 2019. Improving person re-identification by attribute and identity learning. Pattern Recogn. 95 (2019), 151--161. DOI:https://doi.org/10.1016/j.patcog.2019.06.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ruoyu Liu, Yao Zhao, Shikui Wei, Liang Zheng, and Yi Yang. 2019. Modality-invariant image-text embedding for image-sentence matching. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1 (2019), 1--19. DOI:https://doi.org/10.1145/3300939Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xianglong Liu, Lei Huang, Cheng Deng, Bo Lang, and Dacheng Tao. 2016. Query-adaptive hash code ranking for large-scale multi-view visual search. IEEE Transactions on Image Processing 25, 10 (2016), 4514–4524.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yu Liu, Yanming Guo, Erwin M. Bakker, and Michael S. Lew. 2017. Learning a recurrent residual fusion network for multimodal matching. In Proceedings of the ICCV.Google ScholarGoogle Scholar
  47. Lin Ma, Zhengdong Lu, Lifeng Shang, and Hang Li. 2015. Multimodal convolutional neural networks for matching image and sentence. In Proceedings of the ICCV.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2015. Deep captioning with multimodal recurrent neural networks (m-rnn). In Proceedings of the ICLR.Google ScholarGoogle Scholar
  49. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.Google ScholarGoogle Scholar
  50. Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the Interspeech.Google ScholarGoogle ScholarCross RefCross Ref
  51. Hyeonseob Nam, Jung-Woo Ha, and Jeonghee Kim. 2017. Dual attention networks for multimodal reasoning and matching. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  52. Zhenxing Niu, Mo Zhou, Le Wang, Xinbo Gao, and Gang Hua. 2017. Hierarchical multimodal LSTM for dense visual-semantic embedding. In Proceedings of the ICCV.Google ScholarGoogle ScholarCross RefCross Ref
  53. Xuelin Qian, Yanwei Fu, Yu-Gang Jiang, Tao Xiang, and Xiangyang Xue. 2017. Multi-scale deep learning architectures for person re-identification. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  54. Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using Amazon’s mechanical turk. In Proceedings of the NAACL HLT. Association for Computational Linguistics, 139--147.Google ScholarGoogle Scholar
  55. Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In Proceedings of the ACM MM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Scott Reed, Zeynep Akata, Honglak Lee, and Bernt Schiele. 2016. Learning deep representations of fine-grained visual descriptions. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  57. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Abhishek Sharma, Abhishek Kumar, Hal Daume, and David W. Jacobs. 2012. Generalized multiview analysis: A discriminative latent space. In Proceedings of the CVPR.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google ScholarGoogle Scholar
  60. A. Vedaldi and K. Lenc. 2015. MatConvNet—Convolutional neural networks for MATLAB. In Proceedings of the ACM MM.Google ScholarGoogle Scholar
  61. Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  62. Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2017. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. Trans. Pattern Anal. Mach. Intell. 39, 4 (2017), 652--663.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Cheng Wang, Haojin Yang, and Christoph Meinel. 2018. Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s (2018), 40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Di Wang, Xinbo Gao, Xiumei Wang, Lihuo He, and Bo Yuan. 2016. Multimodal discriminative binary embedding for large-scale cross-modal retrieval. IEEE Transactions on Image Processing 25, 10 (2016), 4540–4554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Kaiye Wang, Ran He, Wei Wang, Liang Wang, and Tieniu Tan. 2013. Learning coupled feature spaces for cross-modal matching. In Proceedings of the ICCV.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Liwei Wang, Yin Li, and Svetlana Lazebnik. 2016. Learning deep structure-preserving image-text embeddings. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  67. Liwei Wang, Yin Li, and Svetlana Lazebnik. 2017. Learning two-branch neural networks for image-text matching tasks. arXiv:1704.03470.Google ScholarGoogle Scholar
  68. Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. 2014. Effective multi-modal retrieval based on stacked auto-encoders. Proc. VLDB Endow. 7, 8 (2014), 649--660.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Yunchao Wei, Yao Zhao, Canyi Lu, Shikui Wei, Luoqi Liu, Zhenfeng Zhu, and Shuicheng Yan. 2017. Cross-modal retrieval with cnn visual features: A new baseline. IEEE Trans. Cybernet. 47, 2 (2017), 449--460.Google ScholarGoogle Scholar
  70. Yunchao Wei, Yao Zhao, Zhenfeng Zhu, Shikui Wei, Yanhui Xiao, Jiashi Feng, and Shuicheng Yan. 2016. Modality-dependent cross-media retrieval. ACM Trans. Intell. Syst. Technol. 7, 4 (2016), 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Fei Wu, Xinyan Lu, Zhongfei Zhang, Shuicheng Yan, Yong Rui, and Yueting Zhuang. 2013. Cross-media semantic representation via bi-directional learning to rank. In Proceedings of the ACM MM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wei Bian, and Yi Yang. 2019. Progressive learning for person re-identification with one example. IEEE Trans. Image Process. 28, 6 (June 2019), 2872--2881. DOI:https://doi.org/10.1109/TIP.2019.2891895Google ScholarGoogle ScholarCross RefCross Ref
  73. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.Google ScholarGoogle Scholar
  74. Fei Yan and Krystian Mikolajczyk. 2015. Deep correlation for matching images and text. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  75. Yan Yan, Feiping Nie, Wen Li, Chenqiang Gao, Yi Yang, and Dong Xu. 2016. Image classification by cross-media active learning with privileged information. IEEE Trans. Multimedia 18, 12 (2016), 2494--2502.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Erkun Yang, Cheng Deng, Chao Li, Wei Liu, Jie Li, and Dacheng Tao. 2018. Shared predictive cross-modal deep quantization. IEEE Trans. Neural Netw. Learn. Syst.99 (2018), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  77. Yi Yang, Feiping Nie, Dong Xu, Jiebo Luo, Yueting Zhuang, and Yunhe Pan. 2011. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans. Pattern Anal. Mach. Intell. 34, 4 (2011), 723--742.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL 2 (2014), 67--78.Google ScholarGoogle ScholarCross RefCross Ref
  79. Changqing Zhang, Huazhu Fu, Qinghua Hu, Pengfei Zhu, and Xiaochun Cao. 2017. Flexible multi-view dimensionality co-reduction. IEEE Transactions on Image Processing 26, 2 (2016), 648–659.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Ning Zhang, Jeff Donahue, Ross Girshick, and Trevor Darrell. 2014. Part-based R-CNNs for fine-grained category detection. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  81. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Proceedings of the NIPS.Google ScholarGoogle Scholar
  82. Yuting Zhang, Luyao Yuan, Yijie Guo, Zhiyuan He, I-An Huang, and Honglak Lee. 2017. Discriminative bimodal networks for visual localization and detection with natural language queries. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  83. Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present, and future. arXiv:1610.02984.Google ScholarGoogle Scholar
  84. Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A discriminatively learned CNN embedding for person re-identification. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (2017), 1–20. DOI:https://doi.org/10.1145/3159171Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Linchao Zhu, Zhongwen Xu, Yi Yang, and Alexander G. Hauptmann. 2017. Uncovering the temporal context for video question answering. Int. J. Comput. Vision 124, 3 (2017), 409--421. DOI:https://doi.org/10.1007/s11263-017-1033-7Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dual-path Convolutional Image-Text Embeddings with Instance Loss

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 2
        May 2020
        390 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3401894
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 May 2020
        • Online AM: 7 May 2020
        • Accepted: 1 February 2020
        • Revised: 1 May 2019
        • Received: 1 August 2018
        Published in tomm Volume 16, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format