skip to main content
research-article

Video Retrieval with Similarity-Preserving Deep Temporal Hashing

Authors Info & Claims
Published:16 December 2019Publication History
Skip Abstract Section

Abstract

Despite the fact that remarkable progress has been made in recent years, Content-based Video Retrieval (CBVR) is still an appealing research topic due to increasing search demands in the Internet era of big data. This article aims to explore an efficient CBVR system by discriminately hashing videos into short binary codes. Existing video hashing methods usually encounter two weaknesses originating from the following sources: (1) Most works adopt the separated stages method or the frame-pooling based end-to-end architecture. However, the spatial-temporal properties of videos cannot be fully explored or kept well in the follow-up hashing step. (2) Discriminative learning based on pairwise or triplet constraints often suffers from slow convergence and poor local optimization, mainly because of the limited samples for each update. To alleviate these problems, we propose an end-to-end video retrieval framework called the Similarity-Preserving Deep Temporal Hashing (SPDTH) network. Specifically, we equip the model with the ability to capture spatial-temporal properties of videos and to generate binary codes by stacked Gated Recurrent Units (GRUs). It unifies video temporal modeling and learning to hash into one step to allow for maximum retention of information. We also introduce a deep metric learning objective called ℓ2All_loss for network training by preserving intra-class similarity and inter-class separability, and a quantization loss between the real-valued outputs and the binary codes is minimized. Extensive experiments on several challenging datasets demonstrate that SPDTH can consistently outperform state-of-the-art methods.

References

  1. Liangliang Cao, Zhenguo Li, Yadong Mu, and Shih Fu Chang. 2012. Submodular video hashing: A unified framework towards video pooling and indexing. In ACM International Conference on Multimedia. ACM, 299--308.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yue Cao, Mingsheng Long, Jianmin Wang, Han Zhu, and Qingfu Wen. 2016. Deep quantization network for efficient image retrieval. In 30th AAAI Conference on Artificial Intelligence. AAAI, 3457--3463.Google ScholarGoogle ScholarCross RefCross Ref
  3. Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Philip S. Yu. 2017. Hashnet: Deep learning to hash by continuation. Arxiv Preprint Arxiv:1702.00758 (2017).Google ScholarGoogle Scholar
  4. Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.Google ScholarGoogle Scholar
  5. Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity search in high dimensions via hashing. In International Conference on Very Large Data Bases (VLDB'99). 518--529.Google ScholarGoogle Scholar
  6. Yunchao Gong and S. Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 817--824.Google ScholarGoogle Scholar
  7. Amarnath Gupta. 1997. Visual information retrieval. Communications of the ACM 40, 5, 70--79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yanbin Hao, Tingting Mu, John Y. Goulermas, Jianguo Jiang, Richang Hong, and Wang Meng. 2017. Unsupervised t-distributed video hashing and its deep hashing extension. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 26, 11, 5531--5544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yanbin Hao, Tingting Mu, Richang Hong, Meng Wang, Ning An, and John Y. Goulermas. 2017. Stochastic multiview hashing for large-scale near-duplicate video retrieval. IEEE Transactions on Multimedia 19, 1, 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kaiming He, Fang Wen, and Jian Sun. 2013. K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2938--2945.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  12. Richang Hong, Jinhui Tang, Hung Khoon Tan, Shuicheng Yan, Chongwah Ngo, and Tat Seng Chua. 2009. Event driven summarization for web videos. In SIGMM Workshop on Social Media. ACM, 43--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Richang Hong, Xiao Tong Yuan, Mengdi Xu, Meng Wang, Shuicheng Yan, and Tat Seng Chua. 2010. Movie2Comics:A feast of multimedia artwork. In 18th ACM International Conference on Multimedia 2010, Firenze, Italy, October. 611--614.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Weiming Hu, Nianhua Xie, Li Li, Xianglin Zeng, and Stephen Maybank. 2011. A survey on visual content-based video indexing and retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 41, 6, 797--819.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, and Michael J. Black. 2013. Towards understanding action recognition. In IEEE International Conference on Computer Vision. IEEE, 3192--3199.Google ScholarGoogle Scholar
  16. Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, Daniel Ellis, and Alexander C. Loui. 2011. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval. ACM, 29.Google ScholarGoogle Scholar
  17. Z. Jin, C. Li, Y. Lin, and D. Cai. 2014. Density sensitive hashing. IEEE Trans Cybern 44, 8, 1362--1371.Google ScholarGoogle ScholarCross RefCross Ref
  18. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems, Vol. 25. Curran Associates Inc., 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications 2, 1, 1--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Peng Li, Meng Wang, Jian Cheng, Changsheng Xu, and Hanqing Lu. 2013. Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia 15, 1, 141--152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Qi Li, Zhenan Sun, Ran He, and Tieniu Tan. 2017. Deep supervised discrete hashing. In Advances in Neural Information Processing Systems. PP 99, 2482--2491.Google ScholarGoogle Scholar
  22. Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang. 2015. Feature learning based deep supervised hashing with pairwise labels. Arxiv Preprint Arxiv:1511.03855 (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, and David Suter. 2014. Fast supervised hashing with decision trees for high-dimensional data. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1971--1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Venice Erin Liong, Jiwen Lu, Yap-Peng Tan, and Jie Zhou. 2017. Deep video hashing. IEEE Transactions on Multimedia 19, 6, 1209--1219.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Venice Erin Liong, Jiwen Lu, Gang Wang, Pierre Moulin, and Jie Zhou. 2015. Deep hashing for compact binary codes learning. In Computer Vision and Pattern Recognition (CVPR'15). IEEE, 2475--2483.Google ScholarGoogle Scholar
  26. Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2016. Deep supervised hashing for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2064--2072.Google ScholarGoogle ScholarCross RefCross Ref
  27. Wei Liu, Sanjiv Kumar, Sanjiv Kumar, and Shih Fu Chang. 2014. Discrete graph hashing. In International Conference on Neural Information Processing Systems (NIPS'14). 3419--3427.Google ScholarGoogle Scholar
  28. Wei Liu, Jun Wang, Rongrong Ji, and Yu Gang Jiang. 2012. Supervised hashing with kernels. In Computer Vision and Pattern Recognition. 2074--2081.Google ScholarGoogle Scholar
  29. Wei Liu, Jun Wang, Sanjiv Kumar, and Shih Fu Chang. 2011. Hashing with Graphs. In Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, 1--8.Google ScholarGoogle Scholar
  30. Viet-Anh Nguyen and Minh N. Do. 2016. Deep learning based supervised hashing for efficient image retrieval. In IEEE International Conference on Multimedia and Expo (ICME’16). IEEE, 1--6.Google ScholarGoogle Scholar
  31. Mohammad Norouzi, David J. Fleet, and Ruslan Salakhutdinov. 2012. Hamming distance metric learning. Advances in Neural Information Processing Systems 2, 1061--1069.Google ScholarGoogle Scholar
  32. Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 4004--4012.Google ScholarGoogle ScholarCross RefCross Ref
  33. Cees G. M. Snoek and Marcel Worring. 2008. Concept-based video retrieval. Foundations and Trends in Information Retrieval 2, 4, 215--322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jingkuan Song, Yi Yang, Zi Huang, Heng Tao Shen, and Richang Hong. 2011. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In International Conference on Multimedia. ACM, 423--432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arXiv preprint arXiv:1212.0402.Google ScholarGoogle Scholar
  36. Jun Wang, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. 2016. Learning to hash for indexing big data-a survey. Proc. IEEE 104, 1, 34--57.Google ScholarGoogle ScholarCross RefCross Ref
  37. Xiaofang Wang, Yi Shi, and Kris M. Kitani. 2016. Deep supervised hashing with triplet labels. In Asian Conference on Computer Vision. Springer, 70--84.Google ScholarGoogle Scholar
  38. Kilian Q. Weinberger and Lawrence K. Saul. 2006. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10, 1, 207--244.Google ScholarGoogle Scholar
  39. Xun Yang, Peicheng Zhou, and Meng Wang. 2018. Person reidentification via structural deep metric learning. IEEE Transactions on Neural Networks and Learning Systems. 1--12.Google ScholarGoogle Scholar
  40. Guangnan Ye, Dong Liu, Jun Wang, and Shih Fu Chang. 2014. Large-scale video hashing via structure learning. In IEEE International Conference on Computer Vision. IEEE, 2272--2279.Google ScholarGoogle Scholar
  41. Hanwang Zhang, Meng Wang, Richang Hong, and Tat Seng Chua. 2016. Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In Proceedings of the 24th ACM International Conference on Multimedia. ACM, 781--790.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Fang Zhao, Yongzhen Huang, Liang Wang, and Tieniu Tan. 2015. Deep semantic ranking based hashing for multi-label image retrieval. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, 1556--1564.Google ScholarGoogle Scholar
  43. Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep hashing network for efficient similarity retrieval. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, 2415--2421.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Video Retrieval with Similarity-Preserving Deep Temporal Hashing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!