skip to main content
research-article

Sequential Cross-Modal Hashing Learning via Multi-scale Correlation Mining

Published:26 December 2019Publication History
Skip Abstract Section

Abstract

Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space through hash function, and achieves fast and flexible cross-modal retrieval. Most existing cross-modal hashing methods learn hash function by mining the correlation among multimedia data, but ignore the important property of multimedia data: Each modality of multimedia data has features of different scales, such as texture, object, and scene features in the image, which can provide complementary information for boosting retrieval task. The correlations among the multi-scale features are more abundant than the correlations between single features of multimedia data, which reveal finer underlying structures of the multimedia data and can be used for effective hashing function learning. Therefore, we propose the Multi-scale Correlation Sequential Cross-modal Hashing (MCSCH) approach, and its main contributions can be summarized as follows: (1) Multi-scale feature guided sequential hashing learning method is proposed to share the information from features of different scales through an RNN-based network and generate the hash codes sequentially. The features of different scales are used to guide the hash codes generation, which can enhance the diversity of the hash codes and weaken the influence of errors in specific features, such as false object features caused by occlusion. (2) Multi-scale correlation mining strategy is proposed to align the features of different scales in different modalities and mine the correlations among aligned features. These correlations reveal the finer underlying structure of multimedia data and can help to boost the hash function learning. (3) Correlation evaluation network evaluates the importance of the correlations to select the worthwhile correlations, and increases the impact of these correlations for hash function learning. Experiments on two widely-used 2-media datasets and a 5-media dataset demonstrate the effectiveness of our proposed MCSCH approach.

References

  1. Xianglong Liu, Yadong Mu, Bo Lang, and Shih-Fu Chang. 2014. Mixed image-keyword query adaptive hashing over multilabel images. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 10, 2 (2014), 22.Google ScholarGoogle Scholar
  2. Mengqiu Hu, Yang Yang, Fumin Shen, Ning Xie, and Heng Tao Shen. 2018. Hashing with angular reconstructive embeddings. IEEE Transactions on Image Processing (TIP) 27, 2 (2018), 545--555.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dong Liu, Shuicheng Yan, Rong-Rong Ji, Xian-Sheng Hua, and Hong-Jiang Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9, 1 (2013), 2.Google ScholarGoogle Scholar
  4. Ruimao Zhang, Liang Lin, Rui Zhang, Wangmeng Zuo, and Lei Zhang. 2015. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Transactions on Image Processing (TIP) 24, 12 (2015), 4766--4779.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 35, 12 (2013), 2916--2929.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dong Liu, Shuicheng Yan, Rong-Rong Ji, Xian-Sheng Hua, and Hong-Jiang Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9, 1 (2013), 2.Google ScholarGoogle Scholar
  7. J. Wang, W. Liu, S. Kumar, and S. F. Chang. 2016. Learning to hash for indexing big data—A survey. In Proceedings of the IEEE 104, 1 (2016), 34--57. DOI:http://dx.doi.org/10.1109/JPROC.2015.2487976Google ScholarGoogle ScholarCross RefCross Ref
  8. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 5 (1988), 513--523.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gerard Salton. 1986. Another look at automatic text-retrieval systems. Commun. ACM 29, 7 (1986), 648--656.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Budi Yuwono and Dik L. Lee. 1997. Server ranking for distributed text retrieval systems on the internet. In Database Systems for Advanced Applications (DASFAA). World Scientific, 41--49.Google ScholarGoogle Scholar
  11. Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. Vol. 463. ACM Press New York.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yuxin Peng and Chong-Wah Ngo. 2006. Clip-based similarity measure for query-dependent clip retrieval and video summarization. IEEE Transactions on Circuits and Systems for Video Technology 16, 5 (2006), 612--627.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xinmei Tian, Dacheng Tao, and Yong Rui. 2012. Sparse transfer learning for interactive video search reranking. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 8, 3 (2012), 26.Google ScholarGoogle Scholar
  14. Yuxin Peng and Jinwei Qi. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 1 (2019), 22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. 2013. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD). ACM, 785--796.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shaishav Kumar and Raghavendra Udupa. 2011. Learning hash functions for cross-view similarity search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Vol. 22. 1360.Google ScholarGoogle Scholar
  17. Mingsheng Long, Yue Cao, Jianmin Wang, and Philip S. Yu. 2016. Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 579--588.Google ScholarGoogle Scholar
  18. Michael M. Bronstein, Alexander M. Bronstein, Fabrice Michel, and Nikos Paragios. 2010. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3594--3601.Google ScholarGoogle ScholarCross RefCross Ref
  19. Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 1. 7.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yueting Zhuang, Zhou Yu, Wei Wang, Fei Wu, Siliang Tang, and Jian Shao. 2014. Cross-media hashing with neural networks. In Proceedings of the ACM International Conference on Multimedia (ACM MM). ACM, 901--904.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu. 2016. Correlation autoencoder hashing for supervised cross-modal search. In Proceedings of the International Conference on Multimedia Retrieval (ICMR). ACM, 197--204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yang Yang, Yaqian Duan, Xinze Wang, Zi Huang, Ning Xie, and Heng Tao Shen. 2019. Hierarchical multi-clue modelling for POI popularity prediction with heterogeneous tourist information. IEEE Transactions on Knowledge and Data Engineering (TKDE) 31, 4 (2019), 757--768.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zhaoda Ye and Yuxin Peng. 2018. Multi-scale correlation for sequential cross-modal hashing learning. In Proceedings of the ACM International Conference on Multimedia (ACM MM).Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3/4 (1936), 321--377.Google ScholarGoogle ScholarCross RefCross Ref
  25. Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Proceedings of the Advances in Neural Information Processing Systems (NIPS). 1753--1760.Google ScholarGoogle Scholar
  26. Guiguang Ding, Yuchen Guo, Jile Zhou, and Yue Gao. 2016. Large-scale cross-modality search via collective matrix factorization hashing. IEEE Transactions on Image Processing (TIP) 25, 11 (2016), 5427--5440.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mengqiu Hu, Yang Yang, Fumin Shen, Ning Xie, Richang Hong, and Heng Tao Shen. 2019. Collective reconstructive embeddings for cross-modal hashing. IEEE Transactions on Image Processing (TIP) 28, 6 (2019), 2770--2784.Google ScholarGoogle ScholarCross RefCross Ref
  28. Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2015. Semantic topic multimodal hashing for cross-media retrieval. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 3890--3896.Google ScholarGoogle Scholar
  29. Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2017. Unsupervised generative adversarial cross-modal hashing. arXiv preprint arXiv:1712.00358 (2017).Google ScholarGoogle Scholar
  30. Ying Wei, Yangqiu Song, Yi Zhen, Bo Liu, and Qiang Yang. 2014. Scalable heterogeneous translated hashing. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 791--800.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. 2015. Semantics-preserving hashing for cross-view retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3864--3872.Google ScholarGoogle ScholarCross RefCross Ref
  32. Xiaobo Shen, Fumin Shen, Quan-Sen Sun, Yang Yang, Yun-Hao Yuan, and Heng Tao Shen. 2017. Semi-paired discrete hashing: Learning latent hash codes for semi-paired cross-view retrieval. IEEE Transactions on Cybernetics 47, 12 (2017), 4275--4288.Google ScholarGoogle ScholarCross RefCross Ref
  33. Feng Zheng, Yi Tang, and Ling Shao. 2018. Hetero-manifold regularisation for cross-modal hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 40, 5 (2018), 1059--1071.Google ScholarGoogle ScholarCross RefCross Ref
  34. Xin Luo, Xiao-Ya Yin, Liqiang Nie, Xuemeng Song, Yongxin Wang, and Xin-Shun Xu. 2018. SDMCH: Supervised discrete manifold-embedded cross-modal hashing. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2518--2524.Google ScholarGoogle ScholarCross RefCross Ref
  35. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS). 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2980--2988.Google ScholarGoogle Scholar
  37. Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S. Yuy. 2016. Deep visual-semantic hashing for cross-modal retrieval. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1445--1454.Google ScholarGoogle Scholar
  38. Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3232--3240.Google ScholarGoogle ScholarCross RefCross Ref
  39. Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing 27, 8 (2018), 3893--3903.Google ScholarGoogle ScholarCross RefCross Ref
  40. Ge Song, Dong Wang, and Xiaoyang Tan. 2018. Deep memory network for cross-modal retrieval. IEEE Transactions on Multimedia 21, 5 (2018), 1261--1275.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  42. Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).Google ScholarGoogle Scholar
  43. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4489--4497.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Sydney, Australia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Cory McKay, Ichiro Fujinaga, and Philippe Depalle. 2005. jAudio: A feature extraction library. In Proceedings of the International Conference on Music Information Retrieval. 600--3.Google ScholarGoogle Scholar
  46. Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, and Ming Ouhyoung. 2003. On visual similarity based 3D model retrieval. Computer Graphics Forum 22, 3 (2003), 223--232.Google ScholarGoogle ScholarCross RefCross Ref
  47. Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. ACM, 39--43.Google ScholarGoogle Scholar
  48. Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Nikhil Rasiwasia, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2014. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 36, 3 (2014), 521--535.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yuxin Peng, Xiaohua Zhai, Yunzhen Zhao, and Xin Huang. 2016. Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 26, 3 (2016), 583--596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Xiaohua Zhai, Yuxin Peng, and Jianguo Xiao. 2014. Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Transactions on Circuits and Systems for Video Technology 24, 6 (2014), 965--978.Google ScholarGoogle ScholarCross RefCross Ref
  51. Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2018. Unsupervised generative adversarial cross-modal hashing. In Proceedings of the AAAI Conference on Artifcial Intelligence (AAAI).Google ScholarGoogle ScholarCross RefCross Ref
  52. Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4242--4251.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Sequential Cross-Modal Hashing Learning via Multi-scale Correlation Mining

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 4
      November 2019
      322 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3376119
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 December 2019
      • Accepted: 1 August 2019
      • Revised: 1 May 2019
      • Received: 1 November 2018
      Published in tomm Volume 15, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!