skip to main content
research-article

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

Published:31 March 2021Publication History
Skip Abstract Section

Abstract

With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.

References

  1. Massih-Reza Amini, Nicolas Usunier, and Cyril Goutte. 2009. Learning from multiple partially observed views—An application to multilingual text categorization. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Galen Andrew, Raman Arora, Jeff A. Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning (ICML’13) (JMLR Workshop and Conference Proceedings), Vol. 28. JMLR.org, 1247–1255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2017. Cross-modal scene networks. IEEE Trans. Pattern Anal. Mach. Intell. (2017), 1–1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christian F. Baumgartner, Lisa M. Koch, Kerem Can Tezcan, Jia Xi Ang, and Ender Konukoglu. 2018. Visual feature attribution using Wasserstein GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8309–8319.Google ScholarGoogle ScholarCross RefCross Ref
  5. S. Bickel and T. Scheffer. 2004. Multi-view clustering. In Proceedings of the IEEE Conference on Data Mining (ICDM’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Xiao Cai, Hua Wang, Heng Huang, and Chris Ding. 2012. Joint stage recognition and anatomical annotation of drosophila gene expression patterns. Bioinformatics 28, 12 (2012), i16–i24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Guanqun Cao, Alexandros Iosifidis, Moncef Gabbouj, Vijay Raghavan, and Raju Gottumukkala. 2018. Deep multi-view learning to rank. CoRR abs/1801.10402.Google ScholarGoogle Scholar
  8. Jie Cao, Yibo Hu, Bing Yu, Ran He, and Zhenan Sun. 2018. Load balanced GANs for multi-view face image synthesis. CoRR abs/1802.07447.Google ScholarGoogle Scholar
  9. Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, and Hua Zhang. 2015. Diversity-induced multi-view subspace clustering. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarGoogle ScholarCross RefCross Ref
  10. Yue Cao, Mingsheng Long, Jianmin Wang, and Shichen Liu. 2017. Collective deep quantization for efficient cross-modal retrieval. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, Satinder P. Singh and Shaul Markovitch (Eds.). AAAI Press, 3974–3980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lele Chen, Sudhanshu Srivastava, Zhiyao Duan, and Chenliang Xu. 2017. Deep cross-modal audio-visual generation. In Proceedings of the on Thematic Workshops of ACM Multimedia. 349–357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mickaël Chen and Ludovic Denoyer. 2016. Multi-view generative adversarial networks. CoRR abs/1611.02019.Google ScholarGoogle Scholar
  13. Tanfang Chen, Shangfei Wang, and Shiyu Chen. 2017. Deep multi-modal network for multi-label classification. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’17). IEEE, 955–960.Google ScholarGoogle Scholar
  14. Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 2172–2180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhen-Duo Chen, Wan-Jin Yu, Chuan-Xiang Li, Liqiang Nie, and Xin-Shun Xu. 2018. Dual deep neural networks cross-modal hashing. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  16. Jinjin Chi, Jihong Ouyang, Ximing Li, Yang Wang, and Meng Wang. 2019. Approximate optimal transport for continuous densities with copulas. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. Google ScholarGoogle ScholarCross RefCross Ref
  17. Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8789–8797.Google ScholarGoogle ScholarCross RefCross Ref
  18. LI Chongxuan, Taufik Xu, Jun Zhu, and Bo Zhang. 2017. Triple generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 4088–4098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 1–9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans. Image Process. 27, 8 (2018), 3893–3903.Google ScholarGoogle ScholarCross RefCross Ref
  21. Zhijie Deng, Hao Zhang, Xiaodan Liang, Luona Yang, Shizhen Xu, Jun Zhu, and Eric P. Xing. 2017. Structured generative adversarial networks. In Advances in Neural Information Processing Systems. MIT Press, 3899–3909. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Emily L. Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. Deep generative image models using a laplacian pyramid of adversarial networks. In Proceedings of the Annual Conference on Neural Information Processing Systems, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 1486–1494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Changxing Ding and Dacheng Tao. 2015. Robust face recognition via multi-modal deep face representation. IEEE Trans. Multimedia 17, 11 (2015), 2049–2058.Google ScholarGoogle ScholarCross RefCross Ref
  24. Anastasiia Doinychko and Massih-Reza Amini. 2020. Biconditional generative adversarial networks for multiview learning with missing views. In Proceedings of the 42nd European Conference on IR Research (ECIR’20) (Lecture Notes in Computer Science), Joemon M. Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro, Mário J. Silva, and Flávio Martins (Eds.), Vol. 12035. Springer, 807–820.Google ScholarGoogle ScholarCross RefCross Ref
  25. Changying Du, Changde Du, Xingyu Xie, Chen Zhang, and Hao Wang. 2018. Multi-view adversarially learned inference for cross-domain joint distribution matching. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), Yike Guo and Faisal Farooq (Eds.). ACM, 1348–1357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, and Aaron Courville. 2016. Adversarially learned inference. Retrieved from https://arXiv:1606.00704.Google ScholarGoogle Scholar
  27. Hugo Jair Escalante, Carlos A. Hernández, Jesús A. González, Aurelio López-López, Manuel Montes-y-Gómez, Eduardo F. Morales, Luis Enrique Sucar, Luis Villaseñor Pineda, and Michael Grubinger. 2010. The segmented and annotated IAPR TC-12 benchmark. Comput. Vis. Image Underst. 114, 4 (2010), 419–428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the ACM International Conference on Multimedia (MM’14), Kien A. Hua, Yong Rui, Ralf Steinmetz, Alan Hanjalic, Apostol Natsev, and Wenwu Zhu (Eds.). ACM, 7–16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. David Forsyth, Philip Torr, and Andrew Zisserman. 2008. Lecture notes in computer science: Computer vision. In Proceedings of the European Conference on Computer Vision (ECCV’08). 262–275. Google ScholarGoogle ScholarCross RefCross Ref
  30. Jingyue Gao, Xiting Wang, Yasha Wang, and Xing Xie. 2019. Explainable recommendation through attentive multi-view learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3622–3629.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kamran Ghasedi, Xiaoqian Wang, Cheng Deng, and Heng Huang. 2019. Balanced self-paced learning for generative adversarial clustering network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4391–4400.Google ScholarGoogle ScholarCross RefCross Ref
  32. Kamran Ghasedi Dizaji, Xiaoqian Wang, and Heng Huang. 2018. Semi-supervised generative adversarial network for gene expression inference. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1435–1444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kamran Ghasedi Dizaji, Feng Zheng, Najmeh Sadoughi, Yanhua Yang, Cheng Deng, and Heng Huang. 2018. Unsupervised deep generative adversarial hashing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3664–3673.Google ScholarGoogle ScholarCross RefCross Ref
  34. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 2672–2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jiuxiang Gu, Jianfei Cai, Shafiq R. Joty, Li Niu, and Gang Wang. 2018. Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 7181–7189.Google ScholarGoogle ScholarCross RefCross Ref
  36. Haiyun Guo, Jinqiao Wang, Yue Gao, Jianqiang Li, and Hanqing Lu. 2016. Multi-view 3D object retrieval with deep embedding network. IEEE Trans. Image Process. 25, 12 (2016), 5526–5537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Saurabh Gupta, Judy Hoffman, and Jitendra Malik. 2016. Cross modal distillation for supervision transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2827–2836.Google ScholarGoogle ScholarCross RefCross Ref
  38. Tatsuya Harada, Kuniaki Saito, Yusuke Mukuta, and Yoshitaka Ushiku. 2017. Deep modality invariant adversarial network for shared representation learning. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW’17). IEEE, 2623–2629.Google ScholarGoogle ScholarCross RefCross Ref
  39. Li He, Xing Xu, Huimin Lu, Yang Yang, and Heng Tao Shen. 2017. Unsupervised cross-modal retrieval through adversarial learning. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’17).Google ScholarGoogle ScholarCross RefCross Ref
  40. Chaoqun Hong, Jun Yu, Jian Wan, Dacheng Tao, and Meng Wang. 2015. Multi-modal deep autoencoder for human pose recovery. IEEE Trans. Image Process. 24, 12 (2015), 5659–5670.Google ScholarGoogle ScholarCross RefCross Ref
  41. Junlin Hu, Jiwen Lu, and Yap-Peng Tan. 2017. Sharable and individual multi-view metric learning. IEEE Trans. Pattern Anal. Mach. Intell. 40, 9 (2017), 2281–2288.Google ScholarGoogle ScholarCross RefCross Ref
  42. Mengqiu Hu, Yang Yang, Fumin Shen, Ning Xie, Richang Hong, and Heng Tao Shen. 2018. Collective reconstructive embeddings for cross-modal hashing. IEEE Trans. Image Process. 28, 6 (2018), 2770--2784.Google ScholarGoogle ScholarCross RefCross Ref
  43. Peng Hu, Dezhong Peng, Yongsheng Sang, and Yong Xiang. 2019. Multi-view linear discriminant analysis network. IEEE Trans. Image Process. 28, 11 (2019), 5352–5365.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Shuowen Hu, Jonghyun Choi, Alex L. Chan, and William Robson Schwartz. 2015. Thermal-to-visible face recognition using partial least squares. J. Optic. Soc. Amer. A: Optics Image Sci. Vision 32, 3 (2015), 431.Google ScholarGoogle ScholarCross RefCross Ref
  45. Zhanxuan Hu, Feiping Nie, Rong Wang, and Xuelong Li. 2020. Multi-view spectral clustering via integrating nonnegative embedding and spectral embedding. Info. Fusion 55 (2020), 251--259.Google ScholarGoogle ScholarCross RefCross Ref
  46. Hsin-Chien Huang, Yung-Yu Chuang, and Chu-Song Chen. 2012. Affinity aggregation for spectral clustering. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Sheng-Wei Huang, Che-Tsung Lin, Shu-Ping Chen, Yen-Yi Wu, Po-Hao Hsu, and Shang-Hong Lai. 2018. Auggan: Cross domain adaptation with gan-based data augmentation. In Proceedings of the European Conference on Computer Vision (ECCV’18). 718–731.Google ScholarGoogle Scholar
  48. Xin Huang, Yuxin Peng, and Mingkuan Yuan. 2020. MHTN: Modal-adversarial hybrid transfer network for cross-modal retrieval. IEEE Trans. Cybern. 50, 3 (2020), 1047–1059.Google ScholarGoogle Scholar
  49. Zhenyu Huang, J. Zhou, Xi Peng, Changqing Zhang, Hongyuan Zhu, and Jiancheng Lv. 2019. Multi-view spectral clustering network. In Proceedings of the 28th International Joint Conference on Artificial Intelligence.2563–2569. Google ScholarGoogle ScholarCross RefCross Ref
  50. Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. 39–43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kui Jia, Jiehong Lin, Mingkui Tan, and Dacheng Tao. 2019. Deep multi-view learning using neuron-wise correlation-maximizing regularizers. IEEE Trans. Image Process. 28, 10 (2019), 5121–5134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, 3270–3278.Google ScholarGoogle ScholarCross RefCross Ref
  53. Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, and Qingming Huang. 2019. DM2C: Deep mixed-modal clustering. In Advances in Neural Information Processing Systems. MIT Press, 5880–5890.Google ScholarGoogle Scholar
  54. Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, Daniel Ellis, and Alexander C. Loui. 2011. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval. 1–8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Lu Jin, Kai Li, Zechao Li, Fu Xiao, Guo-Jun Qi, and Jinhui Tang. 2019. Deep semantic-preserving ordinal hashing for cross-modal similarity search. IEEE Trans. Neural Netw. Learn. Syst. 30, 5 (2019), 1429–1440.Google ScholarGoogle ScholarCross RefCross Ref
  56. Meina Kan, Shiguang Shan, and Xilin Chen. 2016. Multi-view deep network for cross-view classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4847–4855.Google ScholarGoogle ScholarCross RefCross Ref
  57. Gagan Kanojia and Shanmuganathan Raman. 2020. MIC-GAN: Multi-view assisted image completion using conditional generative adversarial networks. In Proceedings of the National Conference on Communications (NCC’20). IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  58. Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2016. Deep self-correlation descriptor for dense cross-modal correspondence. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  59. Abhishek Kumar, Piyush Rai, and Hal Daume. 2011. Co-regularized multi-view spectral clustering. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ying-Hsiu Lai and Shang-Hong Lai. 2018. Emotion-preserving representation learning via generative adversarial network for multi-view facial expression recognition. In Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG’18). IEEE, 263–270.Google ScholarGoogle ScholarCross RefCross Ref
  61. Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. 1998. The MNIST database of handwritten digits. Retrieved from http://yann.lecun.com/exdb/mnist.Google ScholarGoogle Scholar
  62. Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 4242–4251.Google ScholarGoogle ScholarCross RefCross Ref
  63. Chao Li, Cheng Deng, Lei Wang, De Xie, and Xianglong Liu. 2019. Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 176–183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Dan Li, Changde Du, and Huiguang He. 2019. Semi-supervised cross-modal image generation with generative adversarial networks. Pattern Recogn. 100 (2019), 107085.Google ScholarGoogle ScholarCross RefCross Ref
  65. Diangang Li, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2020. Infrared-visible cross-modal person re-identification with an X modality. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  66. Jinxing Li, Hongwei Yong, Bob Zhang, Mu Li, Lei Zhang, and David Zhang. 2018. A probabilistic hierarchical model for multi-view and multi-feature classification. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  67. Runde Li, Jinshan Pan, Zechao Li, and Jinhui Tang. 2018. Single image dehazing via conditional generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8202–8211.Google ScholarGoogle ScholarCross RefCross Ref
  68. Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. 2017. Person search with natural language description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1970--1979.Google ScholarGoogle ScholarCross RefCross Ref
  69. Xuelong Li, Di Hu, and Feiping Nie. 2017. Deep binary reconstruction for cross-modal hashing. In Proceedings of the ACM on Multimedia Conference (MM’17), Qiong Liu, Rainer Lienhart, Haohong Wang, Sheng-Wei “Kuan-Ta” Chen, Susanne Boll, Yi-Ping Phoebe Chen, Gerald Friedland, Jia Li, and Shuicheng Yan (Eds.). ACM, 1398–1406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Ximing Li and Yang Wang. 2020. Recovering accurate labeling information from partially valid data for effective multi-label learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’20).Google ScholarGoogle ScholarCross RefCross Ref
  71. Zhaoyang Li, Qianqian Wang, Zhiqiang Tao, Quanxue Gao, and Zhaohua Yang. 2019. Deep adversarial multi-view clustering network. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 2952–2958. Google ScholarGoogle ScholarCross RefCross Ref
  72. Chenhao Lin and Ajay Kumar. 2018. Contactless and partial 3D fingerprint recognition using multi-view deep representation. Pattern Recogn. 83 (2018), 314–327.Google ScholarGoogle ScholarCross RefCross Ref
  73. Tsung Yi Lin, Michael Maire, Serge Belongie, James Hays, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. Springer, Cham, 740--755.Google ScholarGoogle Scholar
  74. Venice Erin Liong, Jiwen Lu, Ling-Yu Duan, and Yap-Peng Tan. 2020. Deep variational and structural hashing. IEEE Trans. Pattern Anal. Mach. Intell. 42, 3 (2020), 580–595.Google ScholarGoogle ScholarCross RefCross Ref
  75. Venice Erin Liong, Jiwen Lu, Yap-Peng Tan, and Jie Zhou. 2017. Cross-modal deep variational hashing. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 4097–4105.Google ScholarGoogle ScholarCross RefCross Ref
  76. Xin Liu, Zhikai Hu, Haibin Ling, and Yiu-ming Cheung. 2019. MTFH: A matrix tri-factorization hashing framework for efficient cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2019).Google ScholarGoogle Scholar
  77. Xinwang Liu, Miaomiao Li, Lei Wang, Yong Dou, Jianping Yin, and En Zhu. 2017. Multiple kernel k-means with incomplete kernels. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, Satinder P. Singh and Shaul Markovitch (Eds.). AAAI Press, 2259–2265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Xuanwu Liu, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Yazhou Ren, and Maozu Guo. 2019. Ranking-based deep cross-modal hashing. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), the 31st Innovative Applications of Artificial Intelligence Conference (IAAI’19), and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’19). AAAI Press, 4400–4407.Google ScholarGoogle Scholar
  79. Xinwang Liu, Xinzhong Zhu, Miaomiao Li, Lei Wang, Chang Tang, Jianping Yin, Dinggang Shen, Huaimin Wang, and Wen Gao. 2019. Late fusion incomplete multi-view clustering. IEEE Trans. Pattern Anal. Mach. Intell. 41, 10 (2019), 2410–2423.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Yi Luo, Guojie Song, Pengyu Li, and Zhongang Qi. 2018. Multi-task medical concept normalization using multi-view convolutional neural network. In Proceeedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  81. Yihui Ma, Jia Jia, Suping Zhou, Jingtian Fu, Yejun Liu, and Zijian Tong. 2017. Towards better understanding the clothing fashion styles: A multi-modal deep learning approach. In Proceeedings of the 31st AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Omid Madani, Manfred Georg, and David A. Ross. 2012. On using nearly-independent feature families for high precision and confidence. In Proceeedings of the Asian Conference on Machine Learning. 269–284.Google ScholarGoogle Scholar
  83. Tahmida Mahmud, Mohammad Billah, and Amit K. Roy-Chowdhury. 2018. Multi-view frame reconstruction with conditional GAN. CoRR abs/1809.10352.Google ScholarGoogle Scholar
  84. Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2019. Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2019), 187--203.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. G. J. McLachlan and T. Krishnan. 1997. The EM Algorithm and Its Extensions. Wiley.Google ScholarGoogle Scholar
  86. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. Retrieved from https://arXiv:1411.1784.Google ScholarGoogle Scholar
  87. Sudipto Mukherjee, Himanshu Asnani, Eugene Lin, and Sreeram Kannan. 2019. Clustergan: Latent space clustering in generative adversarial networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4610–4617.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Feiping Nie, Guohao Cai, and Xuelong Li. 2017. Multi-view clustering and semi-supervised classification with adaptive neighbours. In Proceedings of the AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Feiping Nie, Jing Li, and Xuelong Li. 2016. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Feiping Nie, Shaojun Shi, and Xuelong Li. 2020. Auto-weighted multi-view co-clustering via fast matrix factorization. Pattern Recogn. 102 (2020), 107207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Yulei Niu, Zhiwu Lu, Ji-Rong Wen, Tao Xiang, and Shih-Fu Chang. 2018. Multi-modal multi-scale deep learning for large-scale image annotation. IEEE Trans. Image Process. 28, 4 (2018), 1720–1731.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Yuxin Peng, Xin Huang, and Jinwei Qi. 2016. Cross-media shared representation by hierarchical learning with multiple deep networks. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16), Subbarao Kambhampati (Ed.). IJCAI/AAAI Press, 3846–3853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Yuxin Peng and Jinwei Qi. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Trans. Multim. Comput. Commun. Appl. 15, 1 (2019), 22:1–22:24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Yuxin Peng, Jinwei Qi, Xin Huang, and Yuxin Yuan. 2017. CCL: Cross-modal correlation learning with multi-grained fusion by hierarchical network. CoRR abs/1704.02116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2014. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36, 3 (2014), 521–535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Charles R. Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and multi-view cnns for object classification on 3D data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5648–5656.Google ScholarGoogle Scholar
  97. Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16), Yoshua Bengio and Yann LeCun (Eds.).Google ScholarGoogle Scholar
  98. Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using amazon’s mechanical turk. In Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Scott Reed, Zeynep Akata, Honglak Lee, and Bernt Schiele. 2016. Learning deep representations of fine-grained visual descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 49--58.Google ScholarGoogle ScholarCross RefCross Ref
  100. Alexander Sage, Eirikur Agustsson, Radu Timofte, and Luc Van Gool. 2018. Logo synthesis and manipulation with clustered generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5879–5888.Google ScholarGoogle ScholarCross RefCross Ref
  101. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. In Advances in Neural Information Processing Systems. MIT Press, 2234–2242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. M. Saquib Sarfraz and Rainer Stiefelhagen. 2017. Deep perceptual mapping for cross-modal face recognition. Int. J. Comput. Vision 122, 3 (2017), 426--438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Chao Shang, Aaron Palmer, Jiangwen Sun, Ko-Shin Chen, Jin Lu, and Jinbo Bi. 2017. VIGAN: Missing view imputation with generative adversarial networks. In Proceedings of the IEEE International Conference on Big Data (BigData’17), Jian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, and Masashi Toyoda (Eds.). IEEE Computer Society, 766–775.Google ScholarGoogle ScholarCross RefCross Ref
  104. Fei Shang, Huaxiang Zhang, Lei Zhu, and Jiande Sun. 2019. Adversarial cross-modal retrieval based on dictionary learning. Neurocomputing 355 (2019), 93–104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In Proceedings of the 12th European conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Lingyun Song, Jun Liu, Buyue Qian, Mingxuan Sun, Kuan Yang, Meng Sun, and Samar Abbas. 2018. A deep multi-modal CNN for multi-instance multi-label image classification. IEEE Trans. Image Process. 27, 12 (2018), 6025–6038. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 567--576.Google ScholarGoogle ScholarCross RefCross Ref
  108. Yiwei Sun, Suhang Wang, Tsung-Yu Hsieh, Xianfeng Tang, and Vasant G. Honavar. 2019. MEGAN: A generative adversarial network for multi-view network embedding. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), Sarit Kraus (Ed.). ijcai.org, 3527–3533. Google ScholarGoogle ScholarCross RefCross Ref
  109. Zhiqiang Tao, Hongfu Liu, Jun Li, Zhaowen Wang, and Yun Fu. 2019. Adversarial graph embedding for ensemble clustering. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 3562–3568. Google ScholarGoogle ScholarCross RefCross Ref
  110. Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, and Dimitris N. Metaxas. 2018. CR-GAN: Learning complete representations for multi-view generation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18), Jérôme Lang (Ed.). ijcai.org, 942–948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Rong-Cheng Tu, Xianling Mao, Bing Ma, Yong Hu, Tan Yan, Wei Wei, and Heyan Huang. 2019. Deep cross-modal hashing with hashing functions and unified hash codes jointly learning. CoRR abs/1907.12490.Google ScholarGoogle Scholar
  112. Martijn van Breukelen, Robert P. W. Duin, David M. J. Tax, and J. E. Den Hartog. 1998. Handwritten digit recognition by combined classifiers. Kybernetika 34, 4 (1998), 381–386.Google ScholarGoogle Scholar
  113. Virginia Espinosa-Duró, Marcos Faundez-Zanuy, and Jiří Mekyska. 2013. A new face database simultaneously acquired in visible, near-infrared and thermal spectrums. Cognitive Computation 5, 1 (2013), 119--135.Google ScholarGoogle ScholarCross RefCross Ref
  114. Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017. Adversarial cross-modal retrieval. In Proceedings of the ACM on Multimedia Conference (MM’17), Qiong Liu, Rainer Lienhart, Haohong Wang, Sheng-Wei “Kuan-Ta” Chen, Susanne Boll, Yi-Ping Phoebe Chen, Gerald Friedland, Jia Li, and Shuicheng Yan (Eds.). ACM, 154–162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Cheng Wang, Haojin Yang, and Christoph Meinel. 2015. Deep semantic mapping for cross-modal retrieval. In Proceedings of the 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’15). IEEE Computer Society, 234–241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Daixin Wang, Peng Cui, Mingdong Ou, and Wenwu Zhu. 2015. Learning compact hash codes for multi-modal representations using orthogonal deep structure. IEEE Trans. Multimedia 17, 9 (2015), 1404–1416.Google ScholarGoogle ScholarCross RefCross Ref
  117. Huibing Wang, Yang Wang, Zhao Zhang, Xianping Fu, Mingliang Xu, and Meng Wang. 2020. Kernelized multiview subspace analysis by self-weighted learning. IEEE Trans. Multimedia (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Lichen Wang, Zhengming Ding, Zhiqiang Tao, Yunyu Liu, and Yun Fu. 2019. Generative multi-view human action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). IEEE, 6211–6220.Google ScholarGoogle ScholarCross RefCross Ref
  119. Meng Wang, Xian-Sheng Hua, Richang Hong, Jinhui Tang, Guo-Jun Qi, and Yan Song. 2009. Unified video annotation via multigraph learning. IEEE Trans. Circ. Syst. Video Technol. 19, 5 (2009), 733–746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Meng Wang, Hao Li, Dacheng Tao, Ke Lu, and Xindong Wu. 2012. Multi-modal graph-based reranking for web image search. IEEE Trans. Image Process. 21, 11 (2012), 4649–4611. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Qianqian Wang, Zhengming Ding, Zhiqiang Tao, Quanxue Gao, and Yun Fu. 2018. Partial multi-view clustering via consistent GAN. In Proceedings of the IEEE International Conference on Data Mining (ICDM’18). IEEE, 1290–1295.Google ScholarGoogle ScholarCross RefCross Ref
  122. Tong Wang, Lei Zhu, Zhiyong Cheng, Jingjing Li, and Zan Gao. 2020. Unsupervised deep cross-modal hashing with virtual label regression. Neurocomputing 386 (2020), 84–96.Google ScholarGoogle ScholarCross RefCross Ref
  123. Xu Wang, Dezhong Peng, Peng Hu, and Yongsheng Sang. 2019. Adversarial correlated autoencoder for unsupervised multi-view representation learning. Knowl. Based Syst. 168 (2019), 109–120.Google ScholarGoogle ScholarCross RefCross Ref
  124. Yang Wang, Xiaodi Huang, and Lin Wu. 2013. Clustering via geometric median shift over riemannian manifolds. Info. Sci. 220 (2013), 292--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Yang Wang, Xuemin Lin, Lin Wu, Qing Zhang, and Wenjie Zhang. 2016. Shifting multi-hypergraphs via collaborative probabilistic voting. Knowl. Info. Syst 46, 3 (2016), 515–536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. 2015. Effective multi-query expansions: Robust landmark retrieval. In Proceedings of the ACM Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. 2017. Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Trans. Image Process. 26, 3 (2017), 1393–1404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. 2014. Exploiting correlation consensus: Towards subspace clustering for multi-modal data. In Proceedings of the ACM Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. 2015. LBMCH: Learning bridging mapping for cross-modal hashing. In Proceedings of the ACM SIGIR Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang, and Xiaodi Huang. 2015. Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Process. 24, 11 (2015), 3939–3949.Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Yang Wang and Lin Wu. 2018. Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering. Neural Netw. 103 (2018), 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Yang Wang, Lin Wu, Xuemin Lin, and Junbin Gao. 2018. Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans. Neural Netw. Learning Syst. 29, 10 (2018), 4833--4843.Google ScholarGoogle ScholarCross RefCross Ref
  133. Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. 2016. Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, and Xiang Zhao. 2015. Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans. Neural Netw. Learning Syst. 28, 1 (2015), 57--70.Google ScholarGoogle ScholarCross RefCross Ref
  135. Li Wei, Zhao Rui, Xiao Tong, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Yunchao Wei, Yao Zhao, Canyi Lu, Shikui Wei, Luoqi Liu, Zhenfeng Zhu, and Shuicheng Yan. 2017. Cross-modal retrieval with CNN visual features: A new baseline. IEEE Trans. Cybern. 47, 2 (2017), 449–460.Google ScholarGoogle Scholar
  137. Xin Wen, Zhizhong Han, Xinyu Yin, and Yu-Shen Liu. 2019. Adversarial cross-modal retrieval via learning and transferring single-modal similarities. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’19). IEEE, 478–483.Google ScholarGoogle ScholarCross RefCross Ref
  138. Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, and Jialie Shen. 2018. Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 2854–2860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Lin Wu, Richang Hong, Yang Wang, and Meng Wang. 2019. Cross-entropy adversarial view adaptation for person re-identification. IEEE Trans. Circ. Syst. Video Technol. 30, 7 (2019), 2081--2092.Google ScholarGoogle Scholar
  140. Lin Wu and Yang Wang. 2017. Robust hashing for multi-view data: Jointly learning low-rank kernelized similarity consensus and hash functions. Image Vision Comput. 57 (2017), 58--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Lin Wu, Yang Wang, Junbin Gao, and Xue Li. 2018. Where-and-when to look: Deep siamese attention networks for video-based person re-identification. IEEE Trans. Multimedia 21, 6 (2018), 1412--1424.Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Lin Wu, Yang Wang, Junbin Gao, Meng Wang, Zheng jun Zha, and Dacheng Tao. 2020. Deep co-attention based comparators for relative representation learning in person re-identification. IEEE Trans. Neural Netw. Learn. Syst. (2020).Google ScholarGoogle Scholar
  143. Lin Wu, Yang Wang, Xue Li, and Junbin Gao. 2018. Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Trans. Cybernet. 49, 5 (2018), 1791--1802.Google ScholarGoogle ScholarCross RefCross Ref
  144. Lin Wu, Yang Wang, and Shirui Pan. 2016. Exploiting attribute correlations: A novel trace lasso-based weakly supervised dictionary learning method. IEEE Trans. Cybernet. 47, 12 (2016), 4497--4508.Google ScholarGoogle ScholarCross RefCross Ref
  145. Lin Wu, Yang Wang, and Ling Shao. 2018. Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans. Image Process. 28, 4 (2018), 1602--1612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Lin Wu, Yang Wang, Ling Shao, and Meng Wang. 2019. 3-D PersonVLAD: Learning deep global representations for video-based person reidentification. IEEE Trans. Neural Netw. Learn. Syst. 30, 11 (2019), 3347--3359.Google ScholarGoogle ScholarCross RefCross Ref
  147. Lin Wu, Yang Wang, and John Shepherd. 2013. Efficient image and tag co-ranking: A bregman divergence optimization method. In Proceedings of the ACM Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. Lin Wu, Yang Wang, Hongzhi Yin, Meng Wang, and Ling Shao. 2020. Few-shot deep adversarial learning for video-based person re-identification. IEEE Trans. Image Process. (2020).Google ScholarGoogle ScholarCross RefCross Ref
  149. Xiang Wu, Lingxiao Song, Ran He, and Tieniu Tan. 2018. Coupled deep learning for heterogeneous face recognition. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  150. Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. 2014. Robust multi-view spectral clustering via low-rank and sparse decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. De Xie, Cheng Deng, Chao Li, Xianglong Liu, and Dacheng Tao. 2020. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans. Image Process. 29 (2020), 3626–3637.Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. Chen Xin, Patrick J. Flynn, and Kevin W. Bowyer. 2005. IR and visible light face recognition.Google ScholarGoogle Scholar
  153. Cai Xu, Ziyu Guan, Wei Zhao, Hongchang Wu, Yunfei Niu, and Beilei Ling. 2019. Adversarial incomplete multi-view clustering. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 3933–3939. Google ScholarGoogle ScholarCross RefCross Ref
  154. Chang Xu, Dacheng Tao, and Chao Xu. 2014. Large-margin multi-view information bottleneck. IEEE Trans. Pattern Anal. Mach. Intell. 36, 8 (2014), 1559–1572. Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. 2018. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1316–1324.Google ScholarGoogle ScholarCross RefCross Ref
  156. Xing Xu, Huimin Lu, Jingkuan Song, Yang Yang, Heng Tao Shen, and Xuelong Li. 2019. Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval. IEEE Trans. Cybernet. (2019).Google ScholarGoogle Scholar
  157. Qi Xuan, Zhuangzhi Chen, Yi Liu, Huimin Huang, Guanjun Bao, and Dan Zhang. 2019. Multiview generative adversarial network and its application in pearl classification. IEEE Trans. Ind. Electron. 66, 10 (2019), 8244–8252.Google ScholarGoogle ScholarCross RefCross Ref
  158. Chenggang Yan, Biao Gong, Yuxuan Wei, and Yue Gao. 2020. Deep multi-view enhancement hashing for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2020).Google ScholarGoogle ScholarCross RefCross Ref
  159. Linxiao Yang, Ngai-Man Cheung, Jiaying Li, and Jun Fang. 2019. Deep clustering by gaussian mixture variational autoencoders with graph embedding. In Proceedings of the IEEE International Conference on Computer Vision. 6440–6449.Google ScholarGoogle ScholarCross RefCross Ref
  160. Shijie Yang, Liang Li, Shuhui Wang, Weigang Zhang, and Qi Tian. 2019. SkeletonNet: A hybrid network with a skeleton-embedding process for multi-view image representation learning. IEEE Trans. Multimedia 21, 11 (2019), 2916–2929.Google ScholarGoogle ScholarCross RefCross Ref
  161. Yang Yang, Yi-Feng Wu, De-Chuan Zhan, Zhi-Bin Liu, and Yuan Jiang. 2018. Complex object classification: A multi-modal multi-instance multi-label deep network with optimal transport. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2594–2603. Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Huaxiu Yao, Fei Wu, Jintao Ke, Xianfeng Tang, Yitian Jia, Siyu Lu, Pinghua Gong, Jieping Ye, and Zhenhui Li. 2018. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  163. Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2 (2014), 67–78.Google ScholarGoogle ScholarCross RefCross Ref
  164. Jing Yu, Yuhang Lu, Zengchang Qin, Weifeng Zhang, Yanbing Liu, Jianlong Tan, and Li Guo. 2018. Modeling text with graph convolutional network for cross-modal information retrieval. In Proceedings of the 19th Pacific-Rim Conference on Multimedia (PCM’18) (Lecture Notes in Computer Science), Richang Hong, Wen-Huang Cheng, Toshihiko Yamasaki, Meng Wang, and Chong-Wah Ngo (Eds.), Vol. 11164. Springer, 223–234.Google ScholarGoogle ScholarCross RefCross Ref
  165. Jun Yu, Xiaokang Yang, Fei Gao, and Dacheng Tao. 2016. Deep multi-modal distance metric learning using click constraints for image ranking. IEEE Trans. Cybernet. 47, 12 (2016), 4014–4024.Google ScholarGoogle ScholarCross RefCross Ref
  166. Yuan Yuan, Zhitong Xiong, and Qi Wang. 2019. ACM: Adaptive cross-modal graph convolutional neural networks for RGB-D scene recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9176–9184.Google ScholarGoogle ScholarCross RefCross Ref
  167. Kun Zhan, Feiping Nie, Jing Wang, and Yi Yang. 2018. Multiview consensus graph clustering. IEEE Trans. Image Process. 28, 3 (2018), 1261--1270.Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. Changqing Zhang, Huazhu Fu, Qinghua Hu, Xiaochun Cao, Yuan Xie, Dacheng Tao, and Dong Xu. 2018. Generalized latent multi-view subspace clustering. IEEE Trans. Pattern Anal. Mach. Intell. 42, 1 (2018), 86–99.Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. Changqing Zhang, Zongbo Han, Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu, et al. 2019. CPM-Nets: Cross partial multi-view networks. In Advances in Neural Information Processing Systems. MIT Press, 557–567.Google ScholarGoogle Scholar
  170. Jian Zhang and Yuxin Peng. 2020. Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimedia 22, 1 (2020), 174–187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  171. Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2018. Unsupervised generative adversarial cross-modal hashing. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  172. Ying Zhang and Huchuan Lu. 2018. Deep cross-modal projection learning for image-text matching. In Proceedings of the European Conference on Computer Vision (ECCV’18). 686–701.Google ScholarGoogle ScholarCross RefCross Ref
  173. Zheng Zhang, Li Liu, Fumin Shen, Heng Tao Shen, and Ling Shao. 2018. Binary multi-view clustering. IEEE Trans. Pattern Anal. Mach. Intell. 41, 7 (2018), 1774--1782.Google ScholarGoogle ScholarCross RefCross Ref
  174. Feng Zheng, Yi Tang, and Ling Shao. 2018. Hetero-manifold regularisation for cross-modal hashing. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1059–1071.Google ScholarGoogle ScholarCross RefCross Ref
  175. Pan Zhou, Yunqing Hou, and Jiashi Feng. 2018. Deep adversarial subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1596–1604.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 1s
        January 2021
        353 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3453990
        Issue’s Table of Contents

        Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 March 2021
        • Revised: 1 June 2020
        • Accepted: 1 June 2020
        • Received: 1 April 2020
        Published in tomm Volume 17, Issue 1s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!