skip to main content
research-article

3D Tensor Auto-encoder with Application to Video Compression

Authors Info & Claims
Published:11 May 2021Publication History
Skip Abstract Section

Abstract

Auto-encoder has been widely used to compress high-dimensional data such as the images and videos. However, the traditional auto-encoder network needs to store a large number of parameters. Namely, when the input data is of dimension n, the number of parameters in an auto-encoder is in general O(n). In this article, we introduce a network structure called 3D Tensor Auto-Encoder (3DTAE). Unlike the traditional auto-encoder, in which a video is represented as a vector, our 3DTAE considers videos as 3D tensors to directly pass tensor objects through the network. The weights of each layer are represented by three small matrices, and thus the number of parameters in 3DTAE is just O(n1/3). The compact nature of 3DTAE fits well the needs of video compression. Given an ensemble of high-dimensional videos, we represent them as 3DTAE networks plus some small core tensors, and we further quantize the network parameters and the core tensors to get the final compressed data. Experimental results verify the efficiency of 3DTAE.

References

  1. ISO/IEC CD 23090-3 Versatile Video Coding, document N10692, Joint Video Experts Team (JVET) of ITU-T SG 16 WP3 and ISO/IEC JTC 1/SC 29/WG 11. Retrieved from https://www.hhi.fraunhofer.de/.Google ScholarGoogle Scholar
  2. Victor Agababov, Michael Buettner, Victor Chudnovsky, Mark Cogan, Ben Greenstein, Shane McDaniel, Michael Piatek, Colin Scott, Matt Welsh, and Bolian Yin. 2015. Flywheel: Google’s data compression proxy for the mobile web. In USENIX Symposium on Networked Systems Design and Implementation. 367--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sekine Asadi Amiri and Hamid Hassanpour. 2018. Image compression using JPEG with reduced blocking effects via adaptive down-sampling and self-learning image sparse representation. Multimedia Tools Applic. 77, 7 (2018), 8677--8693. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brett W. Bader, Tamara G. Kolda et al. 2015. MATLAB Tensor Toolbox Version 2.6. Retrieved from http://www.sandia.gov/~tgkolda/TensorToolbox/index-2.6.html.Google ScholarGoogle Scholar
  5. Mohammad Haris Baig, Vladlen Koltun, and Lorenzo Torresani. 2017. Learning to inpaint for image compression. In Advances in Neural Information Processing Systems (NIPS’17). 1246--1255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fabrice Bellard. 2015. The BPG image format. Retrieved from http://bellard.org/bpg/.Google ScholarGoogle Scholar
  7. Johann A. Bengua, Phien Ho, Hoang Duong Tuan, and Minh N. Do. 2016. Matrix product state for higher-order tensor compression and classification. IEEE Trans. Sig. Proc. PP, 99 (2016), 1--1.Google ScholarGoogle Scholar
  8. G. Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. BJONTEGAARD G. Doc. VCEG-M33 ITU-T Q6/16, Austin, TX, USA, 2-4 April 2001.Google ScholarGoogle Scholar
  9. Tong Chen, Haojie Liu, Qiu Shen, Tao Yue, Xun Cao, and Zhan Ma. 2017. DeepCoder: A deep neural network based video compression. In Visual Communications and Image Processing (VCIP’17). IEEE, 1--4.Google ScholarGoogle Scholar
  10. Zhibo Chen, Tianyu He, Xin Jin, and Feng Wu. 2020. Learning for video compression. IEEE Trans. Circ. Syst. Vid. Technol. 30, 2 (2020), 566--576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2018. Deep convolutional autoencoder-based lossy image compression. arXiv preprint arXiv:1804.09535 (2018).Google ScholarGoogle Scholar
  12. Charilaos Christopoulos, Athanassios Skodras et al. 2000. The JPEG2000 still image coding system: An overview. IEEE Trans. Consum. Electron. 46, 4 (2000), 1103--1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. VN Index. 2013. Cisco Visual Networking Index: Forecast and Methodology, 2015--2020. Retrieved from http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360_ns827_Networking_Solutions_White_Paper.html.Google ScholarGoogle Scholar
  14. Wenrui Dai, Yangmei Shen, Xin Tang, Junni Zou, Hongkai Xiong, and Chang Wen Chen. 2016. Sparse representation with spatio-temporal online dictionary learning for promising video coding. IEEE Trans. Image Proc. 25, 10 (2016), 4580--4595. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chris Ding, Heng Huang, and Dijun Luo. 2008. Tensor reduction error analysis--Applications to video compression and classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  16. Bo Du, Mengfei Zhang, Lefei Zhang, and Xuelong Li. 2014. Hyperspectral biological images compression based on multiway tensor projection. In IEEE International Conference on Multimedia and Expo (ICME’14). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  17. Frédéric Dufaux, Gary J. Sullivan, and Touradj Ebrahimi. 2009. The JPEG XR image coding standard [Standards in a Nutshell]. IEEE Sig. Proc. Mag. 26, 6 (2009).Google ScholarGoogle ScholarCross RefCross Ref
  18. Thierry Dumas, Aline Roumy, and Christine Guillemot. 2017. Image compression with stochastic winner-take-all auto-encoder. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 1512--1516.Google ScholarGoogle ScholarCross RefCross Ref
  19. Leyuan Fang, Nanjun He, and Hui Lin. 2017. CP tensor-based compression of hyperspectral images. J. Optic. Soc. Amer. A A 34, 2 (2017), 252--258.Google ScholarGoogle ScholarCross RefCross Ref
  20. Xiph.org Foundation. 2010. Xiph.org Video Test Media. Retrieved from https://media.xiph.org/video/derf/.Google ScholarGoogle Scholar
  21. Shmuel Friedland, Qun Li, and Dan Schonfeld. 2014. Compressive sensing of sparse tensors.IEEE Trans. Image Proc. 23, 10 (2014), 4438--4447.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jun Han, Salvator Lombardo, Christopher Schroers, and Stephan Mandt. 2018. Deep probabilistic video compression. arXiv preprint arXiv:1810.02845 (2018).Google ScholarGoogle Scholar
  23. Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google ScholarGoogle Scholar
  24. Maziar Irannejad and Homayoun Mahdavi-Nasab. 2018. Block matching video compression based on sparse representation and dictionary learning. Circ. Syst. Sig. Proc. 37, 8 (2018), 3537--3557. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Jiang. 1999. Image compression with neural networks èC A survey. Sig. Proc. Image Commun. 14, 9 (1999), 737--760.Google ScholarGoogle ScholarCross RefCross Ref
  26. Fatih Kamisli. 2013. Intra prediction based on Markov process modeling of images. IEEE Trans. Image Proc. 22, 10 (2013), 3916--3925. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sungsoo Kim, Jin Soo Park, Christos G. Bampis, Jaeseong Lee, Mia K. Markey, Alexandros G. Dimakis, and Alan C. Bovik. 2018. Adversarial video compression guided by soft edge detection. arXiv preprint arXiv:1811.10673 (2018).Google ScholarGoogle Scholar
  28. Diederik P. Kingma and Jimmy Ba. 2014. ADAM: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  29. Alex Krizhevsky and Geoffrey E. Hinton. 2012. Using very deep autoencoders for content-based image retrieval. In European Symposium on Artificial Neural Networks (ESANN’11).Google ScholarGoogle Scholar
  30. Yue Li, Dong Liu, Houqiang Li, Li Li, Feng Wu, Hong Zhang, and Haitao Yang. 2018. Convolutional neural network-based block up-sampling for intra frame coding. IEEE Trans. Circ. Syst. Vid. Technol. 28, 9 (2018), 2316--2330.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jiaying Liu, Sifeng Xia, Wenhan Yang, Mading Li, and Dong Liu. 2019. One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Trans. Image Proc. 28, 5 (2019), 2140--2151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Zhenyu Liu, Xianyu Yu, Yuan Gao, Shaolin Chen, Xiangyang Ji, and Dongsheng Wang. 2016. CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Trans. Image Proc. 25, 11 (2016), 5088--5103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. 2019. DVC: An end-to-end deep video compression framework. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 11006--11015.Google ScholarGoogle ScholarCross RefCross Ref
  34. Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2018. Conditional probability models for deep image compression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 3--11.Google ScholarGoogle ScholarCross RefCross Ref
  35. Debargha Mukherjee, Jingning Han, Jim Bankoski, Ronald Bultje, Adrian Grange, John Koleszar, Paul Wilkins, and Yaowu Xu. 2015. A technical overview of VP9—The latest open-source video codec. SMPTE Motion Imag. J. 124, 1 (2015), 44--54.Google ScholarGoogle ScholarCross RefCross Ref
  36. Jörn Ostermann, Jan Bormans, Peter List, Detlev Marpe, Matthias Narroschke, Fernando Pereira, Thomas Stockhammer, and Thomas Wedi. 2004. Video coding with H. 264/AVC: Tools, performance, and complexity. IEEE Circ. Syst. Mag. 4, 1 (2004), 7--28.Google ScholarGoogle ScholarCross RefCross Ref
  37. Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, and Lubomir Bourdev. 2018. Learned video compression. arXiv preprint arXiv:1811.06981 (2018).Google ScholarGoogle Scholar
  38. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533.Google ScholarGoogle ScholarCross RefCross Ref
  39. Yun Song, Gaobo Yang, Hongtao Xie, Dengyong Zhang, and Sun Xingming. 2017. Residual domain dictionary learning for compressed sensing video recovery. Multimedia Tools Applic. 76, 7 (2017), 10083--10096. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, Thomas Wiegand et al. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 1649--1668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. 2017. Lossy image compression with compressive autoencoders. In International Conference on Learning Representations (ICLR’17). 1--19.Google ScholarGoogle Scholar
  42. George Toderici, Sean M. O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. 2015. Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015).Google ScholarGoogle Scholar
  43. George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, and Michele Covell. 2017. Full resolution image compression with recurrent neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5435--5443.Google ScholarGoogle ScholarCross RefCross Ref
  44. Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves et al. 2016. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems. 4790--4798. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. In Advances in Neural Information Processing Systems. 613--621. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Gregory K. Wallace. 1992. The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38, 1 (1992), xviii--xxxiv. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Qingzhu Wang, Mengying Wei, Xiaoming Chen, and Zhuang Miao. 2018. Joint encryption and compression of 3D images based on tensor compressive sensing with non-autonomous 3D chaotic system. Multimedia Tools Applic. 77, 2 (2018), 1715--1734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Tingting Wang, Mingjin Chen, and Hongyang Chao. 2017. A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In Data Compression Conference (DCC). IEEE, 410--419.Google ScholarGoogle ScholarCross RefCross Ref
  49. Chao Yuan Wu, Nayan Singhal, and Philipp Krähenbühl. 2018. Video compression through image interpolation. In European Conference on Computer Vision (ECCV’18). Springer, 425--440.Google ScholarGoogle ScholarCross RefCross Ref
  50. Yimin Yang, Q. M. Jonathan Wu, and Yaonan Wang. 2016. Autoencoder with invertible functions for dimension reduction and image reconstruction. IEEE Trans. Syst. Man Cybern. Syst. PP, 99 (2016), 1--15.Google ScholarGoogle Scholar
  51. Li Yingzhen and Stephan Mandt. 2018. Disentangled sequential autoencoder. In International Conference on Machine Learning (ICML’18). 5656--5665.Google ScholarGoogle Scholar
  52. Jia Zhang, Sam Tak Wu Kwong, Tiesong Zhao, and Horace Ho Shing Ip. 2018. Complexity control in HEVC intra coding for industrial video applications. IEEE Trans. Industr. Inform. PP, 99 (2018), 1--1.Google ScholarGoogle Scholar

Index Terms

  1. 3D Tensor Auto-encoder with Application to Video Compression

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2
            May 2021
            410 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3461621
            Issue’s Table of Contents

            Copyright © 2021 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 May 2021
            • Accepted: 1 October 2020
            • Revised: 1 September 2020
            • Received: 1 November 2019
            Published in tomm Volume 17, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!