skip to main content
research-article

Video Decolorization Based on the CNN and LSTM Neural Network

Published:22 July 2021Publication History
Skip Abstract Section

Abstract

Video decolorization is the process of transferring three-channel color videos into single-channel grayscale videos, which is essentially the decolorization operation of video frames. Most existing video decolorization algorithms directly apply image decolorization methods to decolorize video frames. However, if we only take the single-frame decolorization result into account, it will inevitably cause temporal inconsistency and flicker phenomenon meaning that the same local content between continuous video frames may display different gray values. In addition, there are often similar local content features between video frames, which indicates redundant information. To solve the preceding problems, this article proposes a novel video decolorization algorithm based on the convolutional neural network and the long short-term memory neural network. First, we design a local semantic content encoder to learn and extract the same local content of continuous video frames, which can better preserve the contrast of video frames. Second, a temporal feature controller based on the bi-directional recurrent neural networks with Long short-term memory units is employed to refine the local semantic features, which can greatly maintain temporal consistency of the video sequence to eliminate the flicker phenomenon. Finally, we take advantages of deconvolution to decode the features to produce the grayscale video sequence. Experiments have indicated that our method can better preserve the local contrast of video frames and the temporal consistency over the state of the-art.

References

  1. Codruta Orniana Ancuti, Cosmin Ancuti, and Phillipe Bekaert. 2011. Enhancing by saliency-guided decolorization. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 257–264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Coduta O. Ancuti, Cosmin Ancuti, C. Hermans, and P. Bekaert. 2010. Fusion-based image and video decolorization. In Computer Vision—ACCV 2010. Lecture Notes in Computer Science, Vol. 6492. Springer, 79–92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Codruta O. Ancuti, Cosmin Ancuti, Chris Hermans, and Philippe Bekaert. 2010. Image and video decolorization by fusion. In Proceedings of the Asian Conference on Computer Vision. 79–92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Raja Bala and Reiner Eschbach. 2004. Spatial color-to-grayscale transform preserving chrominance edge information. In Proceedings of the Color and Imaging Conference, Vol. 2004. 82–86.Google ScholarGoogle Scholar
  5. Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 1105–1114.Google ScholarGoogle ScholarCross RefCross Ref
  6. Kai Chen and Qiang Huo. 2016. Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 7 (2016), 1185–1193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lianli Gao, Zhao Guo, Hanwang Zhang, Xing Xu, and Heng Tao Shen. 2017. Video captioning with attention-based LSTM and semantic consistency. IEEE Transactions on Multimedia 19, 9 (2017), 2045–2055.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414–2423.Google ScholarGoogle ScholarCross RefCross Ref
  9. Amy A. Gooch, Sven C. Olsen, Jack Tumblin, and Bruce Gooch. 2005. Color2Gray: Salience-preserving color removal. ACM Transactions on Graphics 24, 3 (2005), 634–639. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mark Grundland and Neil A. Dodgson. 2007. Decolorize: Fast, contrast enhancing, color to grayscale conversion. Pattern Recognition 40, 11 (2007), 2891–2896. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sepp Hochreiter and Jürgen Schmidhuber. 1997. LSTM can solve hard long time lag problems. In Advances in Neural Information Processing Systems. 473–479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 221–231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Zhongping Ji, Mei-e Fang, Yigang Wang, and Weiyin Ma. 2016. Efficient decolorization preserving dominant distinctions. Visual Computer 32, 12 (2016), 1621–1631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cewu Lu, Li Xu, and Jiaya Jia. 2012. Real-time contrast preserving decolorization. In Proceedings of SIGGRAPH Asia 2012 Technical Briefs (SA’12). Article 34, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, Xiangyang Xue, and Shih-Fu Chang. 2018. Modeling multimodal clues in a hybrid deep learning framework for video classification. IEEE Transactions on Multimedia 20, 11 (2018), 3137–3147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725–1732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yongjin Kim, Cheolhun Jang, Julien Demouth, and Seungyong Lee. 2009. Robust color-to-gray via nonlinear global mapping. In Proceedings of ACM SIGGRAPH Asia 2009 Papers (SIGGRAPH Asia’09). Article 161, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jung Gap Kuk, Jae Hyun Ahn, and Nam Ik Cho. 2010. A color to grayscale conversion considering local and global contrast. In Proceedings of the Asian Conference on Computer Vision. 513–524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Qiegen Liu and Henry Leung. 2019. Variable augmented neural network for decolorization and multi-exposure fusion. Information Fusion 46 (2019), 114–127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Qiegen Liu, Peter Xiaoping Liu, Yuhao Wang, and Henry Leung. 2016. Semiparametric decolorization with Laplacian-based perceptual quality metric. IEEE Transactions on Circuits and Systems for Video Technology 27, 9 (2016), 1856–1868.Google ScholarGoogle Scholar
  21. Qiegen Liu, Peter X. Liu, Weisi Xie, Yuhao Wang, and Dong Liang. 2015. GcsDecolor: Gradient correlation similarity for efficient contrast preserving decolorization. IEEE Transactions on Image Processing 24, 9 (2015), 2889–2904.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Qiegen Liu, Guangpu Shao, Yuhao Wang, Junbin Gao, and Henry Leung. 2017. Log-Euclidean metrics for contrast preserving decolorization. IEEE Transactions on Image Processing 26, 12 (2017), 5772–5783.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shiguang Liu and Xiaoli Zhang. 2019. Image decolorization combining local features and exposure features. IEEE Transactions on Multimedia 21, 10 (2019), 2461–2472.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Cewu Lu, Li Xu, and Jiaya Jia. 2012. Contrast preserving decolorization. In Proceedings of the 2012 IEEE International Conference on Computational Photography (ICCP’12). 1–7.Google ScholarGoogle Scholar
  25. Cewu Lu, Li Xu, and Jiaya Jia. 2014. Contrast preserving decolorization with perception-based quality metrics. International Journal of Computer Vision 110, 2 (2014), 222–239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Laszlo Neumann, Martin Čadik, and Antal Nemcsics. 2007. An efficient perception-based adaptive color to gray transformation. In Proceedings of the 3rd Eurographics Conference on Computational Aesthetics in Graphics, Visualization, and Imaging. 73–80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 724–732.Google ScholarGoogle ScholarCross RefCross Ref
  28. Kaleigh Smith, Pierre-Edouard Landes, Joëlle Thollot, and Karol Myszkowski. 2008. Apparent greyscale: A simple and fast conversion to perceptually accurate images and video. Computer Graphics Forum 27 (2008), 193–200.Google ScholarGoogle ScholarCross RefCross Ref
  29. Mingli Song, Dacheng Tao, Chun Chen, Xuelong Li, and Chang Wen Chen. 2010. Color to gray: Visual cue preservation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2010), 1537–1552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yibing Song, Linchao Bao, Xiaobin Xu, and Qingxiong Yang. 2013. Decolorization: Is rgb2gray () out? In Proceedings of SIGGRAPH Asia 2013 Technical Briefs. 1–4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yibing Song, Linchao Bao, and Qingxiong Yang. 2014. Real-time video decolorization using bilateral filtering. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 159–166.Google ScholarGoogle ScholarCross RefCross Ref
  32. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 27 (2014), 3104–3112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yizhang Tao, Yiyi Shen, Bin Sheng, Ping Li, and Rynson W. H. Lau. 2017. Video decolorization using visual proximity coherence optimization. IEEE Transactions on Cybernetics 48, 5 (2017), 1406–1419.Google ScholarGoogle ScholarCross RefCross Ref
  34. Yizhang Tao, Yiyi Shen, Bin Sheng, Ping Li, and Enhua Wu. 2016. Temporal coherent video decolorization using proximity optimization. In Proceedings of the 33rd International Computer Graphics Conference. 41–44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Xingxing Wei, Jun Zhu, Sitong Feng, and Hang Su. 2018. Video-to-video translation with global temporal consistency. In Proceedings of the 26th ACM International Conference on Multimedia. 18–25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jiajun Wu, Joshua B. Tenenbaum, and Pushmeet Kohli. 2017. Neural scene de-rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 699–707.Google ScholarGoogle ScholarCross RefCross Ref
  37. Hongchao Zhang and Shiguang Liu. 2017. Efficient decolorization via perceptual group difference enhancement. In Proceedings of the International Conference on Image and Graphics. 560–569.Google ScholarGoogle ScholarCross RefCross Ref
  38. Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, and Youliang Yan. 2019. Exploiting temporal consistency for real-time video depth estimation. In Proceedings of the IEEE International Conference on Computer Vision. 1725–1734.Google ScholarGoogle ScholarCross RefCross Ref
  39. Xiaoli Zhang and Shiguang Liu. 2018. Contrast preserving image decolorization combining global features and local semantic features. Visual Computer 34, 6-8 (2018), 1099–1108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Hanli Zhao, Haining Zhang, and Xiaogang Jin. 2018. Efficient image decolorization with a multimodal contrast-preserving measure. Computers & Graphics 70 (2018), 251–260.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Video Decolorization Based on the CNN and LSTM Neural Network

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
        August 2021
        443 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3476118
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 July 2021
        • Accepted: 1 December 2020
        • Revised: 1 November 2020
        • Received: 1 May 2020
        Published in tomm Volume 17, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!