Abstract
Despite the fact that remarkable progress has been made in recent years, Content-based Video Retrieval (CBVR) is still an appealing research topic due to increasing search demands in the Internet era of big data. This article aims to explore an efficient CBVR system by discriminately hashing videos into short binary codes. Existing video hashing methods usually encounter two weaknesses originating from the following sources: (1) Most works adopt the separated stages method or the frame-pooling based end-to-end architecture. However, the spatial-temporal properties of videos cannot be fully explored or kept well in the follow-up hashing step. (2) Discriminative learning based on pairwise or triplet constraints often suffers from slow convergence and poor local optimization, mainly because of the limited samples for each update. To alleviate these problems, we propose an end-to-end video retrieval framework called the Similarity-Preserving Deep Temporal Hashing (SPDTH) network. Specifically, we equip the model with the ability to capture spatial-temporal properties of videos and to generate binary codes by stacked Gated Recurrent Units (GRUs). It unifies video temporal modeling and learning to hash into one step to allow for maximum retention of information. We also introduce a deep metric learning objective called ℓ2All_loss for network training by preserving intra-class similarity and inter-class separability, and a quantization loss between the real-valued outputs and the binary codes is minimized. Extensive experiments on several challenging datasets demonstrate that SPDTH can consistently outperform state-of-the-art methods.
- Liangliang Cao, Zhenguo Li, Yadong Mu, and Shih Fu Chang. 2012. Submodular video hashing: A unified framework towards video pooling and indexing. In ACM International Conference on Multimedia. ACM, 299--308.Google Scholar
Digital Library
- Yue Cao, Mingsheng Long, Jianmin Wang, Han Zhu, and Qingfu Wen. 2016. Deep quantization network for efficient image retrieval. In 30th AAAI Conference on Artificial Intelligence. AAAI, 3457--3463.Google Scholar
Cross Ref
- Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Philip S. Yu. 2017. Hashnet: Deep learning to hash by continuation. Arxiv Preprint Arxiv:1702.00758 (2017).Google Scholar
- Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.Google Scholar
- Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity search in high dimensions via hashing. In International Conference on Very Large Data Bases (VLDB'99). 518--529.Google Scholar
- Yunchao Gong and S. Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 817--824.Google Scholar
- Amarnath Gupta. 1997. Visual information retrieval. Communications of the ACM 40, 5, 70--79.Google Scholar
Digital Library
- Yanbin Hao, Tingting Mu, John Y. Goulermas, Jianguo Jiang, Richang Hong, and Wang Meng. 2017. Unsupervised t-distributed video hashing and its deep hashing extension. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 26, 11, 5531--5544.Google Scholar
Digital Library
- Yanbin Hao, Tingting Mu, Richang Hong, Meng Wang, Ning An, and John Y. Goulermas. 2017. Stochastic multiview hashing for large-scale near-duplicate video retrieval. IEEE Transactions on Multimedia 19, 1, 1--14.Google Scholar
Digital Library
- Kaiming He, Fang Wen, and Jian Sun. 2013. K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2938--2945.Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 770--778.Google Scholar
Cross Ref
- Richang Hong, Jinhui Tang, Hung Khoon Tan, Shuicheng Yan, Chongwah Ngo, and Tat Seng Chua. 2009. Event driven summarization for web videos. In SIGMM Workshop on Social Media. ACM, 43--48.Google Scholar
Digital Library
- Richang Hong, Xiao Tong Yuan, Mengdi Xu, Meng Wang, Shuicheng Yan, and Tat Seng Chua. 2010. Movie2Comics:A feast of multimedia artwork. In 18th ACM International Conference on Multimedia 2010, Firenze, Italy, October. 611--614.Google Scholar
Digital Library
- Weiming Hu, Nianhua Xie, Li Li, Xianglin Zeng, and Stephen Maybank. 2011. A survey on visual content-based video indexing and retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 41, 6, 797--819.Google Scholar
Digital Library
- Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, and Michael J. Black. 2013. Towards understanding action recognition. In IEEE International Conference on Computer Vision. IEEE, 3192--3199.Google Scholar
- Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, Daniel Ellis, and Alexander C. Loui. 2011. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval. ACM, 29.Google Scholar
- Z. Jin, C. Li, Y. Lin, and D. Cai. 2014. Density sensitive hashing. IEEE Trans Cybern 44, 8, 1362--1371.Google Scholar
Cross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems, Vol. 25. Curran Associates Inc., 1097--1105.Google Scholar
Digital Library
- Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications 2, 1, 1--19.Google Scholar
Digital Library
- Peng Li, Meng Wang, Jian Cheng, Changsheng Xu, and Hanqing Lu. 2013. Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia 15, 1, 141--152.Google Scholar
Digital Library
- Qi Li, Zhenan Sun, Ran He, and Tieniu Tan. 2017. Deep supervised discrete hashing. In Advances in Neural Information Processing Systems. PP 99, 2482--2491.Google Scholar
- Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang. 2015. Feature learning based deep supervised hashing with pairwise labels. Arxiv Preprint Arxiv:1511.03855 (2015).Google Scholar
Digital Library
- Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, and David Suter. 2014. Fast supervised hashing with decision trees for high-dimensional data. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1971--1978.Google Scholar
Digital Library
- Venice Erin Liong, Jiwen Lu, Yap-Peng Tan, and Jie Zhou. 2017. Deep video hashing. IEEE Transactions on Multimedia 19, 6, 1209--1219.Google Scholar
Digital Library
- Venice Erin Liong, Jiwen Lu, Gang Wang, Pierre Moulin, and Jie Zhou. 2015. Deep hashing for compact binary codes learning. In Computer Vision and Pattern Recognition (CVPR'15). IEEE, 2475--2483.Google Scholar
- Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2016. Deep supervised hashing for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2064--2072.Google Scholar
Cross Ref
- Wei Liu, Sanjiv Kumar, Sanjiv Kumar, and Shih Fu Chang. 2014. Discrete graph hashing. In International Conference on Neural Information Processing Systems (NIPS'14). 3419--3427.Google Scholar
- Wei Liu, Jun Wang, Rongrong Ji, and Yu Gang Jiang. 2012. Supervised hashing with kernels. In Computer Vision and Pattern Recognition. 2074--2081.Google Scholar
- Wei Liu, Jun Wang, Sanjiv Kumar, and Shih Fu Chang. 2011. Hashing with Graphs. In Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, 1--8.Google Scholar
- Viet-Anh Nguyen and Minh N. Do. 2016. Deep learning based supervised hashing for efficient image retrieval. In IEEE International Conference on Multimedia and Expo (ICME’16). IEEE, 1--6.Google Scholar
- Mohammad Norouzi, David J. Fleet, and Ruslan Salakhutdinov. 2012. Hamming distance metric learning. Advances in Neural Information Processing Systems 2, 1061--1069.Google Scholar
- Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 4004--4012.Google Scholar
Cross Ref
- Cees G. M. Snoek and Marcel Worring. 2008. Concept-based video retrieval. Foundations and Trends in Information Retrieval 2, 4, 215--322.Google Scholar
Digital Library
- Jingkuan Song, Yi Yang, Zi Huang, Heng Tao Shen, and Richang Hong. 2011. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In International Conference on Multimedia. ACM, 423--432.Google Scholar
Digital Library
- Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arXiv preprint arXiv:1212.0402.Google Scholar
- Jun Wang, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. 2016. Learning to hash for indexing big data-a survey. Proc. IEEE 104, 1, 34--57.Google Scholar
Cross Ref
- Xiaofang Wang, Yi Shi, and Kris M. Kitani. 2016. Deep supervised hashing with triplet labels. In Asian Conference on Computer Vision. Springer, 70--84.Google Scholar
- Kilian Q. Weinberger and Lawrence K. Saul. 2006. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10, 1, 207--244.Google Scholar
- Xun Yang, Peicheng Zhou, and Meng Wang. 2018. Person reidentification via structural deep metric learning. IEEE Transactions on Neural Networks and Learning Systems. 1--12.Google Scholar
- Guangnan Ye, Dong Liu, Jun Wang, and Shih Fu Chang. 2014. Large-scale video hashing via structure learning. In IEEE International Conference on Computer Vision. IEEE, 2272--2279.Google Scholar
- Hanwang Zhang, Meng Wang, Richang Hong, and Tat Seng Chua. 2016. Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In Proceedings of the 24th ACM International Conference on Multimedia. ACM, 781--790.Google Scholar
Digital Library
- Fang Zhao, Yongzhen Huang, Liang Wang, and Tieniu Tan. 2015. Deep semantic ranking based hashing for multi-label image retrieval. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, 1556--1564.Google Scholar
- Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep hashing network for efficient similarity retrieval. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, 2415--2421.Google Scholar
Cross Ref
Index Terms
Video Retrieval with Similarity-Preserving Deep Temporal Hashing
Recommendations
Classification-enhancement deep hashing for large-scale video retrieval
AbstractWith the explosive growth of video data on the Internet, retrieving and detecting similar video contents effectively has become a challenging problem. Whereas hashing is a mature technique for dealing with this problem, especially in ...
Highlights- Triplet-wise loss is applied into video hashing for similarity preserving.
- Add ...
Submodular video hashing: a unified framework towards video pooling and indexing
MM '12: Proceedings of the 20th ACM international conference on MultimediaThis paper develops a novel framework for efficient large-scale video retrieval. We aim to find video according to higher level similarities, which is beyond the scope of traditional near duplicate search. Following the popular hashing technique we ...
On the significance of cluster-temporal browsing for generic video retrieval: a statistical analysis
MM '06: Proceedings of the 14th ACM international conference on MultimediaIn this paper, we test statistically the effect of content-based browsing in generic video retrieval. Using TRECVID 2004 and 2005 experiments, we demonstrate that content-based browsing improves retrieval over sequential queries and relevance feedback. ...






Comments