Abstract
With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. This fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-arts focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have been exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this article, we provide a substantial overview of the existing state-of-the-arts in the field of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition, and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions in this field.
- Massih-Reza Amini, Nicolas Usunier, and Cyril Goutte. 2009. Learning from multiple partially observed views—An application to multilingual text categorization. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems. Google Scholar
Digital Library
- Galen Andrew, Raman Arora, Jeff A. Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning (ICML’13) (JMLR Workshop and Conference Proceedings), Vol. 28. JMLR.org, 1247–1255. Google Scholar
Digital Library
- Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2017. Cross-modal scene networks. IEEE Trans. Pattern Anal. Mach. Intell. (2017), 1–1. Google Scholar
Digital Library
- Christian F. Baumgartner, Lisa M. Koch, Kerem Can Tezcan, Jia Xi Ang, and Ender Konukoglu. 2018. Visual feature attribution using Wasserstein GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8309–8319.Google Scholar
Cross Ref
- S. Bickel and T. Scheffer. 2004. Multi-view clustering. In Proceedings of the IEEE Conference on Data Mining (ICDM’04). Google Scholar
Digital Library
- Xiao Cai, Hua Wang, Heng Huang, and Chris Ding. 2012. Joint stage recognition and anatomical annotation of drosophila gene expression patterns. Bioinformatics 28, 12 (2012), i16–i24. Google Scholar
Digital Library
- Guanqun Cao, Alexandros Iosifidis, Moncef Gabbouj, Vijay Raghavan, and Raju Gottumukkala. 2018. Deep multi-view learning to rank. CoRR abs/1801.10402.Google Scholar
- Jie Cao, Yibo Hu, Bing Yu, Ran He, and Zhenan Sun. 2018. Load balanced GANs for multi-view face image synthesis. CoRR abs/1802.07447.Google Scholar
- Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, and Hua Zhang. 2015. Diversity-induced multi-view subspace clustering. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15).Google Scholar
Cross Ref
- Yue Cao, Mingsheng Long, Jianmin Wang, and Shichen Liu. 2017. Collective deep quantization for efficient cross-modal retrieval. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, Satinder P. Singh and Shaul Markovitch (Eds.). AAAI Press, 3974–3980. Google Scholar
Digital Library
- Lele Chen, Sudhanshu Srivastava, Zhiyao Duan, and Chenliang Xu. 2017. Deep cross-modal audio-visual generation. In Proceedings of the on Thematic Workshops of ACM Multimedia. 349–357. Google Scholar
Digital Library
- Mickaël Chen and Ludovic Denoyer. 2016. Multi-view generative adversarial networks. CoRR abs/1611.02019.Google Scholar
- Tanfang Chen, Shangfei Wang, and Shiyu Chen. 2017. Deep multi-modal network for multi-label classification. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’17). IEEE, 955–960.Google Scholar
- Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 2172–2180. Google Scholar
Digital Library
- Zhen-Duo Chen, Wan-Jin Yu, Chuan-Xiang Li, Liqiang Nie, and Xin-Shun Xu. 2018. Dual deep neural networks cross-modal hashing. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Jinjin Chi, Jihong Ouyang, Ximing Li, Yang Wang, and Meng Wang. 2019. Approximate optimal transport for continuous densities with copulas. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. Google Scholar
Cross Ref
- Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8789–8797.Google Scholar
Cross Ref
- LI Chongxuan, Taufik Xu, Jun Zhu, and Bo Zhang. 2017. Triple generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 4088–4098. Google Scholar
Digital Library
- Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 1–9. Google Scholar
Digital Library
- Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans. Image Process. 27, 8 (2018), 3893–3903.Google Scholar
Cross Ref
- Zhijie Deng, Hao Zhang, Xiaodan Liang, Luona Yang, Shizhen Xu, Jun Zhu, and Eric P. Xing. 2017. Structured generative adversarial networks. In Advances in Neural Information Processing Systems. MIT Press, 3899–3909. Google Scholar
Digital Library
- Emily L. Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. Deep generative image models using a laplacian pyramid of adversarial networks. In Proceedings of the Annual Conference on Neural Information Processing Systems, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 1486–1494. Google Scholar
Digital Library
- Changxing Ding and Dacheng Tao. 2015. Robust face recognition via multi-modal deep face representation. IEEE Trans. Multimedia 17, 11 (2015), 2049–2058.Google Scholar
Cross Ref
- Anastasiia Doinychko and Massih-Reza Amini. 2020. Biconditional generative adversarial networks for multiview learning with missing views. In Proceedings of the 42nd European Conference on IR Research (ECIR’20) (Lecture Notes in Computer Science), Joemon M. Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro, Mário J. Silva, and Flávio Martins (Eds.), Vol. 12035. Springer, 807–820.Google Scholar
Cross Ref
- Changying Du, Changde Du, Xingyu Xie, Chen Zhang, and Hao Wang. 2018. Multi-view adversarially learned inference for cross-domain joint distribution matching. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), Yike Guo and Faisal Farooq (Eds.). ACM, 1348–1357. Google Scholar
Digital Library
- Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, and Aaron Courville. 2016. Adversarially learned inference. Retrieved from https://arXiv:1606.00704.Google Scholar
- Hugo Jair Escalante, Carlos A. Hernández, Jesús A. González, Aurelio López-López, Manuel Montes-y-Gómez, Eduardo F. Morales, Luis Enrique Sucar, Luis Villaseñor Pineda, and Michael Grubinger. 2010. The segmented and annotated IAPR TC-12 benchmark. Comput. Vis. Image Underst. 114, 4 (2010), 419–428. Google Scholar
Digital Library
- Fangxiang Feng, Xiaojie Wang, and Ruifan Li. 2014. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the ACM International Conference on Multimedia (MM’14), Kien A. Hua, Yong Rui, Ralf Steinmetz, Alan Hanjalic, Apostol Natsev, and Wenwu Zhu (Eds.). ACM, 7–16. Google Scholar
Digital Library
- David Forsyth, Philip Torr, and Andrew Zisserman. 2008. Lecture notes in computer science: Computer vision. In Proceedings of the European Conference on Computer Vision (ECCV’08). 262–275. Google Scholar
Cross Ref
- Jingyue Gao, Xiting Wang, Yasha Wang, and Xing Xie. 2019. Explainable recommendation through attentive multi-view learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3622–3629.Google Scholar
Digital Library
- Kamran Ghasedi, Xiaoqian Wang, Cheng Deng, and Heng Huang. 2019. Balanced self-paced learning for generative adversarial clustering network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4391–4400.Google Scholar
Cross Ref
- Kamran Ghasedi Dizaji, Xiaoqian Wang, and Heng Huang. 2018. Semi-supervised generative adversarial network for gene expression inference. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1435–1444. Google Scholar
Digital Library
- Kamran Ghasedi Dizaji, Feng Zheng, Najmeh Sadoughi, Yanhua Yang, Cheng Deng, and Heng Huang. 2018. Unsupervised deep generative adversarial hashing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3664–3673.Google Scholar
Cross Ref
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 2672–2680. Google Scholar
Digital Library
- Jiuxiang Gu, Jianfei Cai, Shafiq R. Joty, Li Niu, and Gang Wang. 2018. Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 7181–7189.Google Scholar
Cross Ref
- Haiyun Guo, Jinqiao Wang, Yue Gao, Jianqiang Li, and Hanqing Lu. 2016. Multi-view 3D object retrieval with deep embedding network. IEEE Trans. Image Process. 25, 12 (2016), 5526–5537. Google Scholar
Digital Library
- Saurabh Gupta, Judy Hoffman, and Jitendra Malik. 2016. Cross modal distillation for supervision transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2827–2836.Google Scholar
Cross Ref
- Tatsuya Harada, Kuniaki Saito, Yusuke Mukuta, and Yoshitaka Ushiku. 2017. Deep modality invariant adversarial network for shared representation learning. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW’17). IEEE, 2623–2629.Google Scholar
Cross Ref
- Li He, Xing Xu, Huimin Lu, Yang Yang, and Heng Tao Shen. 2017. Unsupervised cross-modal retrieval through adversarial learning. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’17).Google Scholar
Cross Ref
- Chaoqun Hong, Jun Yu, Jian Wan, Dacheng Tao, and Meng Wang. 2015. Multi-modal deep autoencoder for human pose recovery. IEEE Trans. Image Process. 24, 12 (2015), 5659–5670.Google Scholar
Cross Ref
- Junlin Hu, Jiwen Lu, and Yap-Peng Tan. 2017. Sharable and individual multi-view metric learning. IEEE Trans. Pattern Anal. Mach. Intell. 40, 9 (2017), 2281–2288.Google Scholar
Cross Ref
- Mengqiu Hu, Yang Yang, Fumin Shen, Ning Xie, Richang Hong, and Heng Tao Shen. 2018. Collective reconstructive embeddings for cross-modal hashing. IEEE Trans. Image Process. 28, 6 (2018), 2770--2784.Google Scholar
Cross Ref
- Peng Hu, Dezhong Peng, Yongsheng Sang, and Yong Xiang. 2019. Multi-view linear discriminant analysis network. IEEE Trans. Image Process. 28, 11 (2019), 5352–5365.Google Scholar
Digital Library
- Shuowen Hu, Jonghyun Choi, Alex L. Chan, and William Robson Schwartz. 2015. Thermal-to-visible face recognition using partial least squares. J. Optic. Soc. Amer. A: Optics Image Sci. Vision 32, 3 (2015), 431.Google Scholar
Cross Ref
- Zhanxuan Hu, Feiping Nie, Rong Wang, and Xuelong Li. 2020. Multi-view spectral clustering via integrating nonnegative embedding and spectral embedding. Info. Fusion 55 (2020), 251--259.Google Scholar
Cross Ref
- Hsin-Chien Huang, Yung-Yu Chuang, and Chu-Song Chen. 2012. Affinity aggregation for spectral clustering. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’12). Google Scholar
Digital Library
- Sheng-Wei Huang, Che-Tsung Lin, Shu-Ping Chen, Yen-Yi Wu, Po-Hao Hsu, and Shang-Hong Lai. 2018. Auggan: Cross domain adaptation with gan-based data augmentation. In Proceedings of the European Conference on Computer Vision (ECCV’18). 718–731.Google Scholar
- Xin Huang, Yuxin Peng, and Mingkuan Yuan. 2020. MHTN: Modal-adversarial hybrid transfer network for cross-modal retrieval. IEEE Trans. Cybern. 50, 3 (2020), 1047–1059.Google Scholar
- Zhenyu Huang, J. Zhou, Xi Peng, Changqing Zhang, Hongyuan Zhu, and Jiancheng Lv. 2019. Multi-view spectral clustering network. In Proceedings of the 28th International Joint Conference on Artificial Intelligence.2563–2569. Google Scholar
Cross Ref
- Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. 39–43. Google Scholar
Digital Library
- Kui Jia, Jiehong Lin, Mingkui Tan, and Dacheng Tao. 2019. Deep multi-view learning using neuron-wise correlation-maximizing regularizers. IEEE Trans. Image Process. 28, 10 (2019), 5121–5134.Google Scholar
Digital Library
- Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, 3270–3278.Google Scholar
Cross Ref
- Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, and Qingming Huang. 2019. DM2C: Deep mixed-modal clustering. In Advances in Neural Information Processing Systems. MIT Press, 5880–5890.Google Scholar
- Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, Daniel Ellis, and Alexander C. Loui. 2011. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval. 1–8. Google Scholar
Digital Library
- Lu Jin, Kai Li, Zechao Li, Fu Xiao, Guo-Jun Qi, and Jinhui Tang. 2019. Deep semantic-preserving ordinal hashing for cross-modal similarity search. IEEE Trans. Neural Netw. Learn. Syst. 30, 5 (2019), 1429–1440.Google Scholar
Cross Ref
- Meina Kan, Shiguang Shan, and Xilin Chen. 2016. Multi-view deep network for cross-view classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4847–4855.Google Scholar
Cross Ref
- Gagan Kanojia and Shanmuganathan Raman. 2020. MIC-GAN: Multi-view assisted image completion using conditional generative adversarial networks. In Proceedings of the National Conference on Communications (NCC’20). IEEE, 1–6.Google Scholar
Cross Ref
- Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2016. Deep self-correlation descriptor for dense cross-modal correspondence. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- Abhishek Kumar, Piyush Rai, and Hal Daume. 2011. Co-regularized multi-view spectral clustering. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’11). Google Scholar
Digital Library
- Ying-Hsiu Lai and Shang-Hong Lai. 2018. Emotion-preserving representation learning via generative adversarial network for multi-view facial expression recognition. In Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG’18). IEEE, 263–270.Google Scholar
Cross Ref
- Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. 1998. The MNIST database of handwritten digits. Retrieved from http://yann.lecun.com/exdb/mnist.Google Scholar
- Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 4242–4251.Google Scholar
Cross Ref
- Chao Li, Cheng Deng, Lei Wang, De Xie, and Xianglong Liu. 2019. Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 176–183.Google Scholar
Digital Library
- Dan Li, Changde Du, and Huiguang He. 2019. Semi-supervised cross-modal image generation with generative adversarial networks. Pattern Recogn. 100 (2019), 107085.Google Scholar
Cross Ref
- Diangang Li, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2020. Infrared-visible cross-modal person re-identification with an X modality. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
Cross Ref
- Jinxing Li, Hongwei Yong, Bob Zhang, Mu Li, Lei Zhang, and David Zhang. 2018. A probabilistic hierarchical model for multi-view and multi-feature classification. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Runde Li, Jinshan Pan, Zechao Li, and Jinhui Tang. 2018. Single image dehazing via conditional generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8202–8211.Google Scholar
Cross Ref
- Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. 2017. Person search with natural language description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1970--1979.Google Scholar
Cross Ref
- Xuelong Li, Di Hu, and Feiping Nie. 2017. Deep binary reconstruction for cross-modal hashing. In Proceedings of the ACM on Multimedia Conference (MM’17), Qiong Liu, Rainer Lienhart, Haohong Wang, Sheng-Wei “Kuan-Ta” Chen, Susanne Boll, Yi-Ping Phoebe Chen, Gerald Friedland, Jia Li, and Shuicheng Yan (Eds.). ACM, 1398–1406. Google Scholar
Digital Library
- Ximing Li and Yang Wang. 2020. Recovering accurate labeling information from partially valid data for effective multi-label learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’20).Google Scholar
Cross Ref
- Zhaoyang Li, Qianqian Wang, Zhiqiang Tao, Quanxue Gao, and Zhaohua Yang. 2019. Deep adversarial multi-view clustering network. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 2952–2958. Google Scholar
Cross Ref
- Chenhao Lin and Ajay Kumar. 2018. Contactless and partial 3D fingerprint recognition using multi-view deep representation. Pattern Recogn. 83 (2018), 314–327.Google Scholar
Cross Ref
- Tsung Yi Lin, Michael Maire, Serge Belongie, James Hays, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. Springer, Cham, 740--755.Google Scholar
- Venice Erin Liong, Jiwen Lu, Ling-Yu Duan, and Yap-Peng Tan. 2020. Deep variational and structural hashing. IEEE Trans. Pattern Anal. Mach. Intell. 42, 3 (2020), 580–595.Google Scholar
Cross Ref
- Venice Erin Liong, Jiwen Lu, Yap-Peng Tan, and Jie Zhou. 2017. Cross-modal deep variational hashing. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 4097–4105.Google Scholar
Cross Ref
- Xin Liu, Zhikai Hu, Haibin Ling, and Yiu-ming Cheung. 2019. MTFH: A matrix tri-factorization hashing framework for efficient cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2019).Google Scholar
- Xinwang Liu, Miaomiao Li, Lei Wang, Yong Dou, Jianping Yin, and En Zhu. 2017. Multiple kernel k-means with incomplete kernels. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, Satinder P. Singh and Shaul Markovitch (Eds.). AAAI Press, 2259–2265. Google Scholar
Digital Library
- Xuanwu Liu, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Yazhou Ren, and Maozu Guo. 2019. Ranking-based deep cross-modal hashing. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), the 31st Innovative Applications of Artificial Intelligence Conference (IAAI’19), and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’19). AAAI Press, 4400–4407.Google Scholar
- Xinwang Liu, Xinzhong Zhu, Miaomiao Li, Lei Wang, Chang Tang, Jianping Yin, Dinggang Shen, Huaimin Wang, and Wen Gao. 2019. Late fusion incomplete multi-view clustering. IEEE Trans. Pattern Anal. Mach. Intell. 41, 10 (2019), 2410–2423.Google Scholar
Digital Library
- Yi Luo, Guojie Song, Pengyu Li, and Zhongang Qi. 2018. Multi-task medical concept normalization using multi-view convolutional neural network. In Proceeedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Yihui Ma, Jia Jia, Suping Zhou, Jingtian Fu, Yejun Liu, and Zijian Tong. 2017. Towards better understanding the clothing fashion styles: A multi-modal deep learning approach. In Proceeedings of the 31st AAAI Conference on Artificial Intelligence. Google Scholar
Digital Library
- Omid Madani, Manfred Georg, and David A. Ross. 2012. On using nearly-independent feature families for high precision and confidence. In Proceeedings of the Asian Conference on Machine Learning. 269–284.Google Scholar
- Tahmida Mahmud, Mohammad Billah, and Amit K. Roy-Chowdhury. 2018. Multi-view frame reconstruction with conditional GAN. CoRR abs/1809.10352.Google Scholar
- Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2019. Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2019), 187--203.Google Scholar
Digital Library
- G. J. McLachlan and T. Krishnan. 1997. The EM Algorithm and Its Extensions. Wiley.Google Scholar
- Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. Retrieved from https://arXiv:1411.1784.Google Scholar
- Sudipto Mukherjee, Himanshu Asnani, Eugene Lin, and Sreeram Kannan. 2019. Clustergan: Latent space clustering in generative adversarial networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4610–4617.Google Scholar
Digital Library
- Feiping Nie, Guohao Cai, and Xuelong Li. 2017. Multi-view clustering and semi-supervised classification with adaptive neighbours. In Proceedings of the AAAI Conference on Artificial Intelligence. Google Scholar
Digital Library
- Feiping Nie, Jing Li, and Xuelong Li. 2016. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). Google Scholar
Digital Library
- Feiping Nie, Shaojun Shi, and Xuelong Li. 2020. Auto-weighted multi-view co-clustering via fast matrix factorization. Pattern Recogn. 102 (2020), 107207.Google Scholar
Digital Library
- Yulei Niu, Zhiwu Lu, Ji-Rong Wen, Tao Xiang, and Shih-Fu Chang. 2018. Multi-modal multi-scale deep learning for large-scale image annotation. IEEE Trans. Image Process. 28, 4 (2018), 1720–1731.Google Scholar
Digital Library
- Yuxin Peng, Xin Huang, and Jinwei Qi. 2016. Cross-media shared representation by hierarchical learning with multiple deep networks. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16), Subbarao Kambhampati (Ed.). IJCAI/AAAI Press, 3846–3853. Google Scholar
Digital Library
- Yuxin Peng and Jinwei Qi. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Trans. Multim. Comput. Commun. Appl. 15, 1 (2019), 22:1–22:24. Google Scholar
Digital Library
- Yuxin Peng, Jinwei Qi, Xin Huang, and Yuxin Yuan. 2017. CCL: Cross-modal correlation learning with multi-grained fusion by hierarchical network. CoRR abs/1704.02116. Google Scholar
Digital Library
- Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2014. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36, 3 (2014), 521–535. Google Scholar
Digital Library
- Charles R. Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and multi-view cnns for object classification on 3D data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5648–5656.Google Scholar
- Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16), Yoshua Bengio and Yann LeCun (Eds.).Google Scholar
- Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using amazon’s mechanical turk. In Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Google Scholar
Digital Library
- Scott Reed, Zeynep Akata, Honglak Lee, and Bernt Schiele. 2016. Learning deep representations of fine-grained visual descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 49--58.Google Scholar
Cross Ref
- Alexander Sage, Eirikur Agustsson, Radu Timofte, and Luc Van Gool. 2018. Logo synthesis and manipulation with clustered generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5879–5888.Google Scholar
Cross Ref
- Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. In Advances in Neural Information Processing Systems. MIT Press, 2234–2242. Google Scholar
Digital Library
- M. Saquib Sarfraz and Rainer Stiefelhagen. 2017. Deep perceptual mapping for cross-modal face recognition. Int. J. Comput. Vision 122, 3 (2017), 426--438. Google Scholar
Digital Library
- Chao Shang, Aaron Palmer, Jiangwen Sun, Ko-Shin Chen, Jin Lu, and Jinbo Bi. 2017. VIGAN: Missing view imputation with generative adversarial networks. In Proceedings of the IEEE International Conference on Big Data (BigData’17), Jian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, and Masashi Toyoda (Eds.). IEEE Computer Society, 766–775.Google Scholar
Cross Ref
- Fei Shang, Huaxiang Zhang, Lei Zhu, and Jiande Sun. 2019. Adversarial cross-modal retrieval based on dictionary learning. Neurocomputing 355 (2019), 93–104.Google Scholar
Digital Library
- Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In Proceedings of the 12th European conference on Computer Vision. Google Scholar
Digital Library
- Lingyun Song, Jun Liu, Buyue Qian, Mingxuan Sun, Kuan Yang, Meng Sun, and Samar Abbas. 2018. A deep multi-modal CNN for multi-instance multi-label image classification. IEEE Trans. Image Process. 27, 12 (2018), 6025–6038. Google Scholar
Digital Library
- Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 567--576.Google Scholar
Cross Ref
- Yiwei Sun, Suhang Wang, Tsung-Yu Hsieh, Xianfeng Tang, and Vasant G. Honavar. 2019. MEGAN: A generative adversarial network for multi-view network embedding. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), Sarit Kraus (Ed.). ijcai.org, 3527–3533. Google Scholar
Cross Ref
- Zhiqiang Tao, Hongfu Liu, Jun Li, Zhaowen Wang, and Yun Fu. 2019. Adversarial graph embedding for ensemble clustering. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 3562–3568. Google Scholar
Cross Ref
- Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, and Dimitris N. Metaxas. 2018. CR-GAN: Learning complete representations for multi-view generation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18), Jérôme Lang (Ed.). ijcai.org, 942–948. Google Scholar
Digital Library
- Rong-Cheng Tu, Xianling Mao, Bing Ma, Yong Hu, Tan Yan, Wei Wei, and Heyan Huang. 2019. Deep cross-modal hashing with hashing functions and unified hash codes jointly learning. CoRR abs/1907.12490.Google Scholar
- Martijn van Breukelen, Robert P. W. Duin, David M. J. Tax, and J. E. Den Hartog. 1998. Handwritten digit recognition by combined classifiers. Kybernetika 34, 4 (1998), 381–386.Google Scholar
- Virginia Espinosa-Duró, Marcos Faundez-Zanuy, and Jiří Mekyska. 2013. A new face database simultaneously acquired in visible, near-infrared and thermal spectrums. Cognitive Computation 5, 1 (2013), 119--135.Google Scholar
Cross Ref
- Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017. Adversarial cross-modal retrieval. In Proceedings of the ACM on Multimedia Conference (MM’17), Qiong Liu, Rainer Lienhart, Haohong Wang, Sheng-Wei “Kuan-Ta” Chen, Susanne Boll, Yi-Ping Phoebe Chen, Gerald Friedland, Jia Li, and Shuicheng Yan (Eds.). ACM, 154–162. Google Scholar
Digital Library
- Cheng Wang, Haojin Yang, and Christoph Meinel. 2015. Deep semantic mapping for cross-modal retrieval. In Proceedings of the 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’15). IEEE Computer Society, 234–241. Google Scholar
Digital Library
- Daixin Wang, Peng Cui, Mingdong Ou, and Wenwu Zhu. 2015. Learning compact hash codes for multi-modal representations using orthogonal deep structure. IEEE Trans. Multimedia 17, 9 (2015), 1404–1416.Google Scholar
Cross Ref
- Huibing Wang, Yang Wang, Zhao Zhang, Xianping Fu, Mingliang Xu, and Meng Wang. 2020. Kernelized multiview subspace analysis by self-weighted learning. IEEE Trans. Multimedia (2020).Google Scholar
Digital Library
- Lichen Wang, Zhengming Ding, Zhiqiang Tao, Yunyu Liu, and Yun Fu. 2019. Generative multi-view human action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). IEEE, 6211–6220.Google Scholar
Cross Ref
- Meng Wang, Xian-Sheng Hua, Richang Hong, Jinhui Tang, Guo-Jun Qi, and Yan Song. 2009. Unified video annotation via multigraph learning. IEEE Trans. Circ. Syst. Video Technol. 19, 5 (2009), 733–746. Google Scholar
Digital Library
- Meng Wang, Hao Li, Dacheng Tao, Ke Lu, and Xindong Wu. 2012. Multi-modal graph-based reranking for web image search. IEEE Trans. Image Process. 21, 11 (2012), 4649–4611. Google Scholar
Digital Library
- Qianqian Wang, Zhengming Ding, Zhiqiang Tao, Quanxue Gao, and Yun Fu. 2018. Partial multi-view clustering via consistent GAN. In Proceedings of the IEEE International Conference on Data Mining (ICDM’18). IEEE, 1290–1295.Google Scholar
Cross Ref
- Tong Wang, Lei Zhu, Zhiyong Cheng, Jingjing Li, and Zan Gao. 2020. Unsupervised deep cross-modal hashing with virtual label regression. Neurocomputing 386 (2020), 84–96.Google Scholar
Cross Ref
- Xu Wang, Dezhong Peng, Peng Hu, and Yongsheng Sang. 2019. Adversarial correlated autoencoder for unsupervised multi-view representation learning. Knowl. Based Syst. 168 (2019), 109–120.Google Scholar
Cross Ref
- Yang Wang, Xiaodi Huang, and Lin Wu. 2013. Clustering via geometric median shift over riemannian manifolds. Info. Sci. 220 (2013), 292--305. Google Scholar
Digital Library
- Yang Wang, Xuemin Lin, Lin Wu, Qing Zhang, and Wenjie Zhang. 2016. Shifting multi-hypergraphs via collaborative probabilistic voting. Knowl. Info. Syst 46, 3 (2016), 515–536. Google Scholar
Digital Library
- Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. 2015. Effective multi-query expansions: Robust landmark retrieval. In Proceedings of the ACM Conference on Multimedia. Google Scholar
Digital Library
- Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. 2017. Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Trans. Image Process. 26, 3 (2017), 1393–1404. Google Scholar
Digital Library
- Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. 2014. Exploiting correlation consensus: Towards subspace clustering for multi-modal data. In Proceedings of the ACM Conference on Multimedia. Google Scholar
Digital Library
- Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, and Qing Zhang. 2015. LBMCH: Learning bridging mapping for cross-modal hashing. In Proceedings of the ACM SIGIR Conference. Google Scholar
Digital Library
- Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang, and Xiaodi Huang. 2015. Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans. Image Process. 24, 11 (2015), 3939–3949.Google Scholar
Digital Library
- Yang Wang and Lin Wu. 2018. Beyond low-rank representations: Orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering. Neural Netw. 103 (2018), 1--8.Google Scholar
Digital Library
- Yang Wang, Lin Wu, Xuemin Lin, and Junbin Gao. 2018. Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans. Neural Netw. Learning Syst. 29, 10 (2018), 4833--4843.Google Scholar
Cross Ref
- Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. 2016. Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). Google Scholar
Digital Library
- Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, and Xiang Zhao. 2015. Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans. Neural Netw. Learning Syst. 28, 1 (2015), 57--70.Google Scholar
Cross Ref
- Li Wei, Zhao Rui, Xiao Tong, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Yunchao Wei, Yao Zhao, Canyi Lu, Shikui Wei, Luoqi Liu, Zhenfeng Zhu, and Shuicheng Yan. 2017. Cross-modal retrieval with CNN visual features: A new baseline. IEEE Trans. Cybern. 47, 2 (2017), 449–460.Google Scholar
- Xin Wen, Zhizhong Han, Xinyu Yin, and Yu-Shen Liu. 2019. Adversarial cross-modal retrieval via learning and transferring single-modal similarities. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’19). IEEE, 478–483.Google Scholar
Cross Ref
- Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, and Jialie Shen. 2018. Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 2854–2860. Google Scholar
Digital Library
- Lin Wu, Richang Hong, Yang Wang, and Meng Wang. 2019. Cross-entropy adversarial view adaptation for person re-identification. IEEE Trans. Circ. Syst. Video Technol. 30, 7 (2019), 2081--2092.Google Scholar
- Lin Wu and Yang Wang. 2017. Robust hashing for multi-view data: Jointly learning low-rank kernelized similarity consensus and hash functions. Image Vision Comput. 57 (2017), 58--66. Google Scholar
Digital Library
- Lin Wu, Yang Wang, Junbin Gao, and Xue Li. 2018. Where-and-when to look: Deep siamese attention networks for video-based person re-identification. IEEE Trans. Multimedia 21, 6 (2018), 1412--1424.Google Scholar
Digital Library
- Lin Wu, Yang Wang, Junbin Gao, Meng Wang, Zheng jun Zha, and Dacheng Tao. 2020. Deep co-attention based comparators for relative representation learning in person re-identification. IEEE Trans. Neural Netw. Learn. Syst. (2020).Google Scholar
- Lin Wu, Yang Wang, Xue Li, and Junbin Gao. 2018. Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Trans. Cybernet. 49, 5 (2018), 1791--1802.Google Scholar
Cross Ref
- Lin Wu, Yang Wang, and Shirui Pan. 2016. Exploiting attribute correlations: A novel trace lasso-based weakly supervised dictionary learning method. IEEE Trans. Cybernet. 47, 12 (2016), 4497--4508.Google Scholar
Cross Ref
- Lin Wu, Yang Wang, and Ling Shao. 2018. Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans. Image Process. 28, 4 (2018), 1602--1612.Google Scholar
Digital Library
- Lin Wu, Yang Wang, Ling Shao, and Meng Wang. 2019. 3-D PersonVLAD: Learning deep global representations for video-based person reidentification. IEEE Trans. Neural Netw. Learn. Syst. 30, 11 (2019), 3347--3359.Google Scholar
Cross Ref
- Lin Wu, Yang Wang, and John Shepherd. 2013. Efficient image and tag co-ranking: A bregman divergence optimization method. In Proceedings of the ACM Conference on Multimedia. Google Scholar
Digital Library
- Lin Wu, Yang Wang, Hongzhi Yin, Meng Wang, and Ling Shao. 2020. Few-shot deep adversarial learning for video-based person re-identification. IEEE Trans. Image Process. (2020).Google Scholar
Cross Ref
- Xiang Wu, Lingxiao Song, Ran He, and Tieniu Tan. 2018. Coupled deep learning for heterogeneous face recognition. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. 2014. Robust multi-view spectral clustering via low-rank and sparse decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence. Google Scholar
Digital Library
- De Xie, Cheng Deng, Chao Li, Xianglong Liu, and Dacheng Tao. 2020. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans. Image Process. 29 (2020), 3626–3637.Google Scholar
Digital Library
- Chen Xin, Patrick J. Flynn, and Kevin W. Bowyer. 2005. IR and visible light face recognition.Google Scholar
- Cai Xu, Ziyu Guan, Wei Zhao, Hongchang Wu, Yunfei Niu, and Beilei Ling. 2019. Adversarial incomplete multi-view clustering. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 3933–3939. Google Scholar
Cross Ref
- Chang Xu, Dacheng Tao, and Chao Xu. 2014. Large-margin multi-view information bottleneck. IEEE Trans. Pattern Anal. Mach. Intell. 36, 8 (2014), 1559–1572. Google Scholar
Digital Library
- Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. 2018. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1316–1324.Google Scholar
Cross Ref
- Xing Xu, Huimin Lu, Jingkuan Song, Yang Yang, Heng Tao Shen, and Xuelong Li. 2019. Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval. IEEE Trans. Cybernet. (2019).Google Scholar
- Qi Xuan, Zhuangzhi Chen, Yi Liu, Huimin Huang, Guanjun Bao, and Dan Zhang. 2019. Multiview generative adversarial network and its application in pearl classification. IEEE Trans. Ind. Electron. 66, 10 (2019), 8244–8252.Google Scholar
Cross Ref
- Chenggang Yan, Biao Gong, Yuxuan Wei, and Yue Gao. 2020. Deep multi-view enhancement hashing for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2020).Google Scholar
Cross Ref
- Linxiao Yang, Ngai-Man Cheung, Jiaying Li, and Jun Fang. 2019. Deep clustering by gaussian mixture variational autoencoders with graph embedding. In Proceedings of the IEEE International Conference on Computer Vision. 6440–6449.Google Scholar
Cross Ref
- Shijie Yang, Liang Li, Shuhui Wang, Weigang Zhang, and Qi Tian. 2019. SkeletonNet: A hybrid network with a skeleton-embedding process for multi-view image representation learning. IEEE Trans. Multimedia 21, 11 (2019), 2916–2929.Google Scholar
Cross Ref
- Yang Yang, Yi-Feng Wu, De-Chuan Zhan, Zhi-Bin Liu, and Yuan Jiang. 2018. Complex object classification: A multi-modal multi-instance multi-label deep network with optimal transport. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2594–2603. Google Scholar
Digital Library
- Huaxiu Yao, Fei Wu, Jintao Ke, Xianfeng Tang, Yitian Jia, Siyu Lu, Pinghua Gong, Jieping Ye, and Zhenhui Li. 2018. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2 (2014), 67–78.Google Scholar
Cross Ref
- Jing Yu, Yuhang Lu, Zengchang Qin, Weifeng Zhang, Yanbing Liu, Jianlong Tan, and Li Guo. 2018. Modeling text with graph convolutional network for cross-modal information retrieval. In Proceedings of the 19th Pacific-Rim Conference on Multimedia (PCM’18) (Lecture Notes in Computer Science), Richang Hong, Wen-Huang Cheng, Toshihiko Yamasaki, Meng Wang, and Chong-Wah Ngo (Eds.), Vol. 11164. Springer, 223–234.Google Scholar
Cross Ref
- Jun Yu, Xiaokang Yang, Fei Gao, and Dacheng Tao. 2016. Deep multi-modal distance metric learning using click constraints for image ranking. IEEE Trans. Cybernet. 47, 12 (2016), 4014–4024.Google Scholar
Cross Ref
- Yuan Yuan, Zhitong Xiong, and Qi Wang. 2019. ACM: Adaptive cross-modal graph convolutional neural networks for RGB-D scene recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9176–9184.Google Scholar
Cross Ref
- Kun Zhan, Feiping Nie, Jing Wang, and Yi Yang. 2018. Multiview consensus graph clustering. IEEE Trans. Image Process. 28, 3 (2018), 1261--1270.Google Scholar
Digital Library
- Changqing Zhang, Huazhu Fu, Qinghua Hu, Xiaochun Cao, Yuan Xie, Dacheng Tao, and Dong Xu. 2018. Generalized latent multi-view subspace clustering. IEEE Trans. Pattern Anal. Mach. Intell. 42, 1 (2018), 86–99.Google Scholar
Digital Library
- Changqing Zhang, Zongbo Han, Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu, et al. 2019. CPM-Nets: Cross partial multi-view networks. In Advances in Neural Information Processing Systems. MIT Press, 557–567.Google Scholar
- Jian Zhang and Yuxin Peng. 2020. Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimedia 22, 1 (2020), 174–187.Google Scholar
Digital Library
- Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2018. Unsupervised generative adversarial cross-modal hashing. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Ying Zhang and Huchuan Lu. 2018. Deep cross-modal projection learning for image-text matching. In Proceedings of the European Conference on Computer Vision (ECCV’18). 686–701.Google Scholar
Cross Ref
- Zheng Zhang, Li Liu, Fumin Shen, Heng Tao Shen, and Ling Shao. 2018. Binary multi-view clustering. IEEE Trans. Pattern Anal. Mach. Intell. 41, 7 (2018), 1774--1782.Google Scholar
Cross Ref
- Feng Zheng, Yi Tang, and Ling Shao. 2018. Hetero-manifold regularisation for cross-modal hashing. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1059–1071.Google Scholar
Cross Ref
- Pan Zhou, Yunqing Hou, and Jiashi Feng. 2018. Deep adversarial subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1596–1604.Google Scholar
Cross Ref
Index Terms
Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion
Recommendations
Multi-modal deep distance metric learning
In many real-world applications, data contain heterogeneous input modalities (e.g., web pages include images, text, etc.). Moreover, data such as images are usually described using different views (i.e. different sets of features). Learning a ...
A Survey on Deep Learning for Multimodal Data Fusion
With the wide deployments of heterogeneous networks, huge amounts of data with characteristics of high volume, high variety, high velocity, and high veracity are generated. These data, referred to multimodal big data, contain abundant intermodality and ...
Prompting for Multi-Modal Tracking
MM '22: Proceedings of the 30th ACM International Conference on MultimediaMulti-modal tracking gains attention due to its ability to be more accurate and robust in complex scenarios compared to traditional RGB-based tracking. Its key lies in how to fuse multi-modal data and reduce the gap between modalities. However, multi-...






Comments