Abstract
Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space through hash function, and achieves fast and flexible cross-modal retrieval. Most existing cross-modal hashing methods learn hash function by mining the correlation among multimedia data, but ignore the important property of multimedia data: Each modality of multimedia data has features of different scales, such as texture, object, and scene features in the image, which can provide complementary information for boosting retrieval task. The correlations among the multi-scale features are more abundant than the correlations between single features of multimedia data, which reveal finer underlying structures of the multimedia data and can be used for effective hashing function learning. Therefore, we propose the Multi-scale Correlation Sequential Cross-modal Hashing (MCSCH) approach, and its main contributions can be summarized as follows: (1) Multi-scale feature guided sequential hashing learning method is proposed to share the information from features of different scales through an RNN-based network and generate the hash codes sequentially. The features of different scales are used to guide the hash codes generation, which can enhance the diversity of the hash codes and weaken the influence of errors in specific features, such as false object features caused by occlusion. (2) Multi-scale correlation mining strategy is proposed to align the features of different scales in different modalities and mine the correlations among aligned features. These correlations reveal the finer underlying structure of multimedia data and can help to boost the hash function learning. (3) Correlation evaluation network evaluates the importance of the correlations to select the worthwhile correlations, and increases the impact of these correlations for hash function learning. Experiments on two widely-used 2-media datasets and a 5-media dataset demonstrate the effectiveness of our proposed MCSCH approach.
- Xianglong Liu, Yadong Mu, Bo Lang, and Shih-Fu Chang. 2014. Mixed image-keyword query adaptive hashing over multilabel images. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 10, 2 (2014), 22.Google Scholar
- Mengqiu Hu, Yang Yang, Fumin Shen, Ning Xie, and Heng Tao Shen. 2018. Hashing with angular reconstructive embeddings. IEEE Transactions on Image Processing (TIP) 27, 2 (2018), 545--555.Google Scholar
Digital Library
- Dong Liu, Shuicheng Yan, Rong-Rong Ji, Xian-Sheng Hua, and Hong-Jiang Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9, 1 (2013), 2.Google Scholar
- Ruimao Zhang, Liang Lin, Rui Zhang, Wangmeng Zuo, and Lei Zhang. 2015. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Transactions on Image Processing (TIP) 24, 12 (2015), 4766--4779.Google Scholar
Digital Library
- Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 35, 12 (2013), 2916--2929.Google Scholar
Digital Library
- Dong Liu, Shuicheng Yan, Rong-Rong Ji, Xian-Sheng Hua, and Hong-Jiang Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9, 1 (2013), 2.Google Scholar
- J. Wang, W. Liu, S. Kumar, and S. F. Chang. 2016. Learning to hash for indexing big data—A survey. In Proceedings of the IEEE 104, 1 (2016), 34--57. DOI:http://dx.doi.org/10.1109/JPROC.2015.2487976Google Scholar
Cross Ref
- Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 5 (1988), 513--523.Google Scholar
Digital Library
- Gerard Salton. 1986. Another look at automatic text-retrieval systems. Commun. ACM 29, 7 (1986), 648--656.Google Scholar
Digital Library
- Budi Yuwono and Dik L. Lee. 1997. Server ranking for distributed text retrieval systems on the internet. In Database Systems for Advanced Applications (DASFAA). World Scientific, 41--49.Google Scholar
- Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. Vol. 463. ACM Press New York.Google Scholar
Digital Library
- Yuxin Peng and Chong-Wah Ngo. 2006. Clip-based similarity measure for query-dependent clip retrieval and video summarization. IEEE Transactions on Circuits and Systems for Video Technology 16, 5 (2006), 612--627.Google Scholar
Digital Library
- Xinmei Tian, Dacheng Tao, and Yong Rui. 2012. Sparse transfer learning for interactive video search reranking. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 8, 3 (2012), 26.Google Scholar
- Yuxin Peng and Jinwei Qi. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 1 (2019), 22.Google Scholar
Digital Library
- Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. 2013. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD). ACM, 785--796.Google Scholar
Digital Library
- Shaishav Kumar and Raghavendra Udupa. 2011. Learning hash functions for cross-view similarity search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Vol. 22. 1360.Google Scholar
- Mingsheng Long, Yue Cao, Jianmin Wang, and Philip S. Yu. 2016. Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 579--588.Google Scholar
- Michael M. Bronstein, Alexander M. Bronstein, Fabrice Michel, and Nikos Paragios. 2010. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3594--3601.Google Scholar
Cross Ref
- Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 1. 7.Google Scholar
Cross Ref
- Yueting Zhuang, Zhou Yu, Wei Wang, Fei Wu, Siliang Tang, and Jian Shao. 2014. Cross-media hashing with neural networks. In Proceedings of the ACM International Conference on Multimedia (ACM MM). ACM, 901--904.Google Scholar
Digital Library
- Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu. 2016. Correlation autoencoder hashing for supervised cross-modal search. In Proceedings of the International Conference on Multimedia Retrieval (ICMR). ACM, 197--204.Google Scholar
Digital Library
- Yang Yang, Yaqian Duan, Xinze Wang, Zi Huang, Ning Xie, and Heng Tao Shen. 2019. Hierarchical multi-clue modelling for POI popularity prediction with heterogeneous tourist information. IEEE Transactions on Knowledge and Data Engineering (TKDE) 31, 4 (2019), 757--768.Google Scholar
Digital Library
- Zhaoda Ye and Yuxin Peng. 2018. Multi-scale correlation for sequential cross-modal hashing learning. In Proceedings of the ACM International Conference on Multimedia (ACM MM).Google Scholar
Digital Library
- H. Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3/4 (1936), 321--377.Google Scholar
Cross Ref
- Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Proceedings of the Advances in Neural Information Processing Systems (NIPS). 1753--1760.Google Scholar
- Guiguang Ding, Yuchen Guo, Jile Zhou, and Yue Gao. 2016. Large-scale cross-modality search via collective matrix factorization hashing. IEEE Transactions on Image Processing (TIP) 25, 11 (2016), 5427--5440.Google Scholar
Digital Library
- Mengqiu Hu, Yang Yang, Fumin Shen, Ning Xie, Richang Hong, and Heng Tao Shen. 2019. Collective reconstructive embeddings for cross-modal hashing. IEEE Transactions on Image Processing (TIP) 28, 6 (2019), 2770--2784.Google Scholar
Cross Ref
- Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2015. Semantic topic multimodal hashing for cross-media retrieval. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 3890--3896.Google Scholar
- Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2017. Unsupervised generative adversarial cross-modal hashing. arXiv preprint arXiv:1712.00358 (2017).Google Scholar
- Ying Wei, Yangqiu Song, Yi Zhen, Bo Liu, and Qiang Yang. 2014. Scalable heterogeneous translated hashing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 791--800.Google Scholar
Digital Library
- Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. 2015. Semantics-preserving hashing for cross-view retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3864--3872.Google Scholar
Cross Ref
- Xiaobo Shen, Fumin Shen, Quan-Sen Sun, Yang Yang, Yun-Hao Yuan, and Heng Tao Shen. 2017. Semi-paired discrete hashing: Learning latent hash codes for semi-paired cross-view retrieval. IEEE Transactions on Cybernetics 47, 12 (2017), 4275--4288.Google Scholar
Cross Ref
- Feng Zheng, Yi Tang, and Ling Shao. 2018. Hetero-manifold regularisation for cross-modal hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 40, 5 (2018), 1059--1071.Google Scholar
Cross Ref
- Xin Luo, Xiao-Ya Yin, Liqiang Nie, Xuemeng Song, Yongxin Wang, and Xin-Shun Xu. 2018. SDMCH: Supervised discrete manifold-embedded cross-modal hashing. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2518--2524.Google Scholar
Cross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS). 1097--1105.Google Scholar
Digital Library
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2980--2988.Google Scholar
- Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S. Yuy. 2016. Deep visual-semantic hashing for cross-modal retrieval. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1445--1454.Google Scholar
- Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3232--3240.Google Scholar
Cross Ref
- Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing 27, 8 (2018), 3893--3903.Google Scholar
Cross Ref
- Ge Song, Dong Wang, and Xiaoyang Tan. 2018. Deep memory network for cross-modal retrieval. IEEE Transactions on Multimedia 21, 5 (2018), 1261--1275.Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).Google Scholar
- Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4489--4497.Google Scholar
Digital Library
- Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Sydney, Australia.Google Scholar
Digital Library
- Cory McKay, Ichiro Fujinaga, and Philippe Depalle. 2005. jAudio: A feature extraction library. In Proceedings of the International Conference on Music Information Retrieval. 600--3.Google Scholar
- Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, and Ming Ouhyoung. 2003. On visual similarity based 3D model retrieval. Computer Graphics Forum 22, 3 (2003), 223--232.Google Scholar
Cross Ref
- Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. ACM, 39--43.Google Scholar
- Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2014. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 36, 3 (2014), 521--535.Google Scholar
Digital Library
- Yuxin Peng, Xiaohua Zhai, Yunzhen Zhao, and Xin Huang. 2016. Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 26, 3 (2016), 583--596.Google Scholar
Digital Library
- Xiaohua Zhai, Yuxin Peng, and Jianguo Xiao. 2014. Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Transactions on Circuits and Systems for Video Technology 24, 6 (2014), 965--978.Google Scholar
Cross Ref
- Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2018. Unsupervised generative adversarial cross-modal hashing. In Proceedings of the AAAI Conference on Artifcial Intelligence (AAAI).Google Scholar
Cross Ref
- Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4242--4251.Google Scholar
Cross Ref
Index Terms
Sequential Cross-Modal Hashing Learning via Multi-scale Correlation Mining
Recommendations
Multi-Scale Correlation for Sequential Cross-modal Hashing Learning
MM '18: Proceedings of the 26th ACM international conference on MultimediaCross-modal hashing aims to learn hash functions, which map heterogeneous multimedia data into common Hamming space for fast and flexible cross-modal retrieval. Recently, several cross-modal hashing methods learn the hash functions by mining the ...
Cross-Modal Self-Taught Hashing for large-scale image retrieval
Cross-modal hashing integrates the advantages of traditional cross-modal retrieval and hashing, it can solve large-scale cross-modal retrieval effectively and efficiently. However, existing cross-modal hashing methods rely on either labeled training ...
Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval
With the advance of internet and multimedia technologies, large-scale multi-modal representation techniques such as cross-modal hashing, are increasingly demanded for multimedia retrieval. In cross-modal hashing, three essential problems should be ...






Comments