Abstract
Video decolorization is the process of transferring three-channel color videos into single-channel grayscale videos, which is essentially the decolorization operation of video frames. Most existing video decolorization algorithms directly apply image decolorization methods to decolorize video frames. However, if we only take the single-frame decolorization result into account, it will inevitably cause temporal inconsistency and flicker phenomenon meaning that the same local content between continuous video frames may display different gray values. In addition, there are often similar local content features between video frames, which indicates redundant information. To solve the preceding problems, this article proposes a novel video decolorization algorithm based on the convolutional neural network and the long short-term memory neural network. First, we design a local semantic content encoder to learn and extract the same local content of continuous video frames, which can better preserve the contrast of video frames. Second, a temporal feature controller based on the bi-directional recurrent neural networks with Long short-term memory units is employed to refine the local semantic features, which can greatly maintain temporal consistency of the video sequence to eliminate the flicker phenomenon. Finally, we take advantages of deconvolution to decode the features to produce the grayscale video sequence. Experiments have indicated that our method can better preserve the local contrast of video frames and the temporal consistency over the state of the-art.
- Codruta Orniana Ancuti, Cosmin Ancuti, and Phillipe Bekaert. 2011. Enhancing by saliency-guided decolorization. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 257–264. Google Scholar
Digital Library
- Coduta O. Ancuti, Cosmin Ancuti, C. Hermans, and P. Bekaert. 2010. Fusion-based image and video decolorization. In Computer Vision—ACCV 2010. Lecture Notes in Computer Science, Vol. 6492. Springer, 79–92. Google Scholar
Digital Library
- Codruta O. Ancuti, Cosmin Ancuti, Chris Hermans, and Philippe Bekaert. 2010. Image and video decolorization by fusion. In Proceedings of the Asian Conference on Computer Vision. 79–92. Google Scholar
Digital Library
- Raja Bala and Reiner Eschbach. 2004. Spatial color-to-grayscale transform preserving chrominance edge information. In Proceedings of the Color and Imaging Conference, Vol. 2004. 82–86.Google Scholar
- Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 1105–1114.Google Scholar
Cross Ref
- Kai Chen and Qiang Huo. 2016. Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 7 (2016), 1185–1193. Google Scholar
Digital Library
- Lianli Gao, Zhao Guo, Hanwang Zhang, Xing Xu, and Heng Tao Shen. 2017. Video captioning with attention-based LSTM and semantic consistency. IEEE Transactions on Multimedia 19, 9 (2017), 2045–2055.Google Scholar
Digital Library
- Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414–2423.Google Scholar
Cross Ref
- Amy A. Gooch, Sven C. Olsen, Jack Tumblin, and Bruce Gooch. 2005. Color2Gray: Salience-preserving color removal. ACM Transactions on Graphics 24, 3 (2005), 634–639. Google Scholar
Digital Library
- Mark Grundland and Neil A. Dodgson. 2007. Decolorize: Fast, contrast enhancing, color to grayscale conversion. Pattern Recognition 40, 11 (2007), 2891–2896. Google Scholar
Digital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. LSTM can solve hard long time lag problems. In Advances in Neural Information Processing Systems. 473–479. Google Scholar
Digital Library
- Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 221–231. Google Scholar
Digital Library
- Zhongping Ji, Mei-e Fang, Yigang Wang, and Weiyin Ma. 2016. Efficient decolorization preserving dominant distinctions. Visual Computer 32, 12 (2016), 1621–1631. Google Scholar
Digital Library
- Cewu Lu, Li Xu, and Jiaya Jia. 2012. Real-time contrast preserving decolorization. In Proceedings of SIGGRAPH Asia 2012 Technical Briefs (SA’12). Article 34, 4 pages. Google Scholar
Digital Library
- Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, Xiangyang Xue, and Shih-Fu Chang. 2018. Modeling multimodal clues in a hybrid deep learning framework for video classification. IEEE Transactions on Multimedia 20, 11 (2018), 3137–3147.Google Scholar
Digital Library
- Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725–1732. Google Scholar
Digital Library
- Yongjin Kim, Cheolhun Jang, Julien Demouth, and Seungyong Lee. 2009. Robust color-to-gray via nonlinear global mapping. In Proceedings of ACM SIGGRAPH Asia 2009 Papers (SIGGRAPH Asia’09). Article 161, 4 pages. Google Scholar
Digital Library
- Jung Gap Kuk, Jae Hyun Ahn, and Nam Ik Cho. 2010. A color to grayscale conversion considering local and global contrast. In Proceedings of the Asian Conference on Computer Vision. 513–524. Google Scholar
Digital Library
- Qiegen Liu and Henry Leung. 2019. Variable augmented neural network for decolorization and multi-exposure fusion. Information Fusion 46 (2019), 114–127.Google Scholar
Digital Library
- Qiegen Liu, Peter Xiaoping Liu, Yuhao Wang, and Henry Leung. 2016. Semiparametric decolorization with Laplacian-based perceptual quality metric. IEEE Transactions on Circuits and Systems for Video Technology 27, 9 (2016), 1856–1868.Google Scholar
- Qiegen Liu, Peter X. Liu, Weisi Xie, Yuhao Wang, and Dong Liang. 2015. GcsDecolor: Gradient correlation similarity for efficient contrast preserving decolorization. IEEE Transactions on Image Processing 24, 9 (2015), 2889–2904.Google Scholar
Digital Library
- Qiegen Liu, Guangpu Shao, Yuhao Wang, Junbin Gao, and Henry Leung. 2017. Log-Euclidean metrics for contrast preserving decolorization. IEEE Transactions on Image Processing 26, 12 (2017), 5772–5783.Google Scholar
Digital Library
- Shiguang Liu and Xiaoli Zhang. 2019. Image decolorization combining local features and exposure features. IEEE Transactions on Multimedia 21, 10 (2019), 2461–2472.Google Scholar
Digital Library
- Cewu Lu, Li Xu, and Jiaya Jia. 2012. Contrast preserving decolorization. In Proceedings of the 2012 IEEE International Conference on Computational Photography (ICCP’12). 1–7.Google Scholar
- Cewu Lu, Li Xu, and Jiaya Jia. 2014. Contrast preserving decolorization with perception-based quality metrics. International Journal of Computer Vision 110, 2 (2014), 222–239. Google Scholar
Digital Library
- Laszlo Neumann, Martin Čadik, and Antal Nemcsics. 2007. An efficient perception-based adaptive color to gray transformation. In Proceedings of the 3rd Eurographics Conference on Computational Aesthetics in Graphics, Visualization, and Imaging. 73–80. Google Scholar
Digital Library
- Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 724–732.Google Scholar
Cross Ref
- Kaleigh Smith, Pierre-Edouard Landes, Joëlle Thollot, and Karol Myszkowski. 2008. Apparent greyscale: A simple and fast conversion to perceptually accurate images and video. Computer Graphics Forum 27 (2008), 193–200.Google Scholar
Cross Ref
- Mingli Song, Dacheng Tao, Chun Chen, Xuelong Li, and Chang Wen Chen. 2010. Color to gray: Visual cue preservation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2010), 1537–1552. Google Scholar
Digital Library
- Yibing Song, Linchao Bao, Xiaobin Xu, and Qingxiong Yang. 2013. Decolorization: Is rgb2gray () out? In Proceedings of SIGGRAPH Asia 2013 Technical Briefs. 1–4. Google Scholar
Digital Library
- Yibing Song, Linchao Bao, and Qingxiong Yang. 2014. Real-time video decolorization using bilateral filtering. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 159–166.Google Scholar
Cross Ref
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 27 (2014), 3104–3112. Google Scholar
Digital Library
- Yizhang Tao, Yiyi Shen, Bin Sheng, Ping Li, and Rynson W. H. Lau. 2017. Video decolorization using visual proximity coherence optimization. IEEE Transactions on Cybernetics 48, 5 (2017), 1406–1419.Google Scholar
Cross Ref
- Yizhang Tao, Yiyi Shen, Bin Sheng, Ping Li, and Enhua Wu. 2016. Temporal coherent video decolorization using proximity optimization. In Proceedings of the 33rd International Computer Graphics Conference. 41–44. Google Scholar
Digital Library
- Xingxing Wei, Jun Zhu, Sitong Feng, and Hang Su. 2018. Video-to-video translation with global temporal consistency. In Proceedings of the 26th ACM International Conference on Multimedia. 18–25. Google Scholar
Digital Library
- Jiajun Wu, Joshua B. Tenenbaum, and Pushmeet Kohli. 2017. Neural scene de-rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 699–707.Google Scholar
Cross Ref
- Hongchao Zhang and Shiguang Liu. 2017. Efficient decolorization via perceptual group difference enhancement. In Proceedings of the International Conference on Image and Graphics. 560–569.Google Scholar
Cross Ref
- Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, and Youliang Yan. 2019. Exploiting temporal consistency for real-time video depth estimation. In Proceedings of the IEEE International Conference on Computer Vision. 1725–1734.Google Scholar
Cross Ref
- Xiaoli Zhang and Shiguang Liu. 2018. Contrast preserving image decolorization combining global features and local semantic features. Visual Computer 34, 6-8 (2018), 1099–1108. Google Scholar
Digital Library
- Hanli Zhao, Haining Zhang, and Xiaogang Jin. 2018. Efficient image decolorization with a multimodal contrast-preserving measure. Computers & Graphics 70 (2018), 251–260.Google Scholar
Cross Ref
Index Terms
Video Decolorization Based on the CNN and LSTM Neural Network
Recommendations
Enhancing by saliency-guided decolorization
CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern RecognitionThis paper introduces an effective decolorization algorithm that preserves the appearance of the original color image. Guided by the original saliency, the method blends the luminance and the chrominance information in order to conserve the initial ...
Sales Demand Forecast based on Recurrent Neural Network∗
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer EngineeringIn this paper, we compare the performance of RNN, LSTM and GRU, which are the most popular cyclic neural networks, in predicting the total sales of products in each store next month. A dropout layer is added to the model to reduce over-fitting. The ...
Research on advertising content recognition based on convolutional neural network and recurrent neural network
The problem to be solved in this paper is to identify the text advertisement information published by users in a medium-sized social networking website. First, the text is segmented and then the text is transformed into sequence tensor by using a word ...






Comments