Abstract
Auto-encoder has been widely used to compress high-dimensional data such as the images and videos. However, the traditional auto-encoder network needs to store a large number of parameters. Namely, when the input data is of dimension n, the number of parameters in an auto-encoder is in general O(n). In this article, we introduce a network structure called 3D Tensor Auto-Encoder (3DTAE). Unlike the traditional auto-encoder, in which a video is represented as a vector, our 3DTAE considers videos as 3D tensors to directly pass tensor objects through the network. The weights of each layer are represented by three small matrices, and thus the number of parameters in 3DTAE is just O(n1/3). The compact nature of 3DTAE fits well the needs of video compression. Given an ensemble of high-dimensional videos, we represent them as 3DTAE networks plus some small core tensors, and we further quantize the network parameters and the core tensors to get the final compressed data. Experimental results verify the efficiency of 3DTAE.
- ISO/IEC CD 23090-3 Versatile Video Coding, document N10692, Joint Video Experts Team (JVET) of ITU-T SG 16 WP3 and ISO/IEC JTC 1/SC 29/WG 11. Retrieved from https://www.hhi.fraunhofer.de/.Google Scholar
- Victor Agababov, Michael Buettner, Victor Chudnovsky, Mark Cogan, Ben Greenstein, Shane McDaniel, Michael Piatek, Colin Scott, Matt Welsh, and Bolian Yin. 2015. Flywheel: Google’s data compression proxy for the mobile web. In USENIX Symposium on Networked Systems Design and Implementation. 367--380. Google Scholar
Digital Library
- Sekine Asadi Amiri and Hamid Hassanpour. 2018. Image compression using JPEG with reduced blocking effects via adaptive down-sampling and self-learning image sparse representation. Multimedia Tools Applic. 77, 7 (2018), 8677--8693. Google Scholar
Digital Library
- Brett W. Bader, Tamara G. Kolda et al. 2015. MATLAB Tensor Toolbox Version 2.6. Retrieved from http://www.sandia.gov/~tgkolda/TensorToolbox/index-2.6.html.Google Scholar
- Mohammad Haris Baig, Vladlen Koltun, and Lorenzo Torresani. 2017. Learning to inpaint for image compression. In Advances in Neural Information Processing Systems (NIPS’17). 1246--1255. Google Scholar
Digital Library
- Fabrice Bellard. 2015. The BPG image format. Retrieved from http://bellard.org/bpg/.Google Scholar
- Johann A. Bengua, Phien Ho, Hoang Duong Tuan, and Minh N. Do. 2016. Matrix product state for higher-order tensor compression and classification. IEEE Trans. Sig. Proc. PP, 99 (2016), 1--1.Google Scholar
- G. Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. BJONTEGAARD G. Doc. VCEG-M33 ITU-T Q6/16, Austin, TX, USA, 2-4 April 2001.Google Scholar
- Tong Chen, Haojie Liu, Qiu Shen, Tao Yue, Xun Cao, and Zhan Ma. 2017. DeepCoder: A deep neural network based video compression. In Visual Communications and Image Processing (VCIP’17). IEEE, 1--4.Google Scholar
- Zhibo Chen, Tianyu He, Xin Jin, and Feng Wu. 2020. Learning for video compression. IEEE Trans. Circ. Syst. Vid. Technol. 30, 2 (2020), 566--576.Google Scholar
Digital Library
- Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2018. Deep convolutional autoencoder-based lossy image compression. arXiv preprint arXiv:1804.09535 (2018).Google Scholar
- Charilaos Christopoulos, Athanassios Skodras et al. 2000. The JPEG2000 still image coding system: An overview. IEEE Trans. Consum. Electron. 46, 4 (2000), 1103--1127. Google Scholar
Digital Library
- VN Index. 2013. Cisco Visual Networking Index: Forecast and Methodology, 2015--2020. Retrieved from http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360_ns827_Networking_Solutions_White_Paper.html.Google Scholar
- Wenrui Dai, Yangmei Shen, Xin Tang, Junni Zou, Hongkai Xiong, and Chang Wen Chen. 2016. Sparse representation with spatio-temporal online dictionary learning for promising video coding. IEEE Trans. Image Proc. 25, 10 (2016), 4580--4595. Google Scholar
Digital Library
- Chris Ding, Heng Huang, and Dijun Luo. 2008. Tensor reduction error analysis--Applications to video compression and classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE, 1--8.Google Scholar
Cross Ref
- Bo Du, Mengfei Zhang, Lefei Zhang, and Xuelong Li. 2014. Hyperspectral biological images compression based on multiway tensor projection. In IEEE International Conference on Multimedia and Expo (ICME’14). IEEE, 1--6.Google Scholar
Cross Ref
- Frédéric Dufaux, Gary J. Sullivan, and Touradj Ebrahimi. 2009. The JPEG XR image coding standard [Standards in a Nutshell]. IEEE Sig. Proc. Mag. 26, 6 (2009).Google Scholar
Cross Ref
- Thierry Dumas, Aline Roumy, and Christine Guillemot. 2017. Image compression with stochastic winner-take-all auto-encoder. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 1512--1516.Google Scholar
Cross Ref
- Leyuan Fang, Nanjun He, and Hui Lin. 2017. CP tensor-based compression of hyperspectral images. J. Optic. Soc. Amer. A A 34, 2 (2017), 252--258.Google Scholar
Cross Ref
- Xiph.org Foundation. 2010. Xiph.org Video Test Media. Retrieved from https://media.xiph.org/video/derf/.Google Scholar
- Shmuel Friedland, Qun Li, and Dan Schonfeld. 2014. Compressive sensing of sparse tensors.IEEE Trans. Image Proc. 23, 10 (2014), 4438--4447.Google Scholar
Cross Ref
- Jun Han, Salvator Lombardo, Christopher Schroers, and Stephan Mandt. 2018. Deep probabilistic video compression. arXiv preprint arXiv:1810.02845 (2018).Google Scholar
- Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google Scholar
- Maziar Irannejad and Homayoun Mahdavi-Nasab. 2018. Block matching video compression based on sparse representation and dictionary learning. Circ. Syst. Sig. Proc. 37, 8 (2018), 3537--3557. Google Scholar
Digital Library
- J. Jiang. 1999. Image compression with neural networks èC A survey. Sig. Proc. Image Commun. 14, 9 (1999), 737--760.Google Scholar
Cross Ref
- Fatih Kamisli. 2013. Intra prediction based on Markov process modeling of images. IEEE Trans. Image Proc. 22, 10 (2013), 3916--3925. Google Scholar
Digital Library
- Sungsoo Kim, Jin Soo Park, Christos G. Bampis, Jaeseong Lee, Mia K. Markey, Alexandros G. Dimakis, and Alan C. Bovik. 2018. Adversarial video compression guided by soft edge detection. arXiv preprint arXiv:1811.10673 (2018).Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. ADAM: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Alex Krizhevsky and Geoffrey E. Hinton. 2012. Using very deep autoencoders for content-based image retrieval. In European Symposium on Artificial Neural Networks (ESANN’11).Google Scholar
- Yue Li, Dong Liu, Houqiang Li, Li Li, Feng Wu, Hong Zhang, and Haitao Yang. 2018. Convolutional neural network-based block up-sampling for intra frame coding. IEEE Trans. Circ. Syst. Vid. Technol. 28, 9 (2018), 2316--2330.Google Scholar
Cross Ref
- Jiaying Liu, Sifeng Xia, Wenhan Yang, Mading Li, and Dong Liu. 2019. One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Trans. Image Proc. 28, 5 (2019), 2140--2151. Google Scholar
Digital Library
- Zhenyu Liu, Xianyu Yu, Yuan Gao, Shaolin Chen, Xiangyang Ji, and Dongsheng Wang. 2016. CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Trans. Image Proc. 25, 11 (2016), 5088--5103. Google Scholar
Digital Library
- Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. 2019. DVC: An end-to-end deep video compression framework. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 11006--11015.Google Scholar
Cross Ref
- Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2018. Conditional probability models for deep image compression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 3--11.Google Scholar
Cross Ref
- Debargha Mukherjee, Jingning Han, Jim Bankoski, Ronald Bultje, Adrian Grange, John Koleszar, Paul Wilkins, and Yaowu Xu. 2015. A technical overview of VP9—The latest open-source video codec. SMPTE Motion Imag. J. 124, 1 (2015), 44--54.Google Scholar
Cross Ref
- Jörn Ostermann, Jan Bormans, Peter List, Detlev Marpe, Matthias Narroschke, Fernando Pereira, Thomas Stockhammer, and Thomas Wedi. 2004. Video coding with H. 264/AVC: Tools, performance, and complexity. IEEE Circ. Syst. Mag. 4, 1 (2004), 7--28.Google Scholar
Cross Ref
- Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, and Lubomir Bourdev. 2018. Learned video compression. arXiv preprint arXiv:1811.06981 (2018).Google Scholar
- David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533.Google Scholar
Cross Ref
- Yun Song, Gaobo Yang, Hongtao Xie, Dengyong Zhang, and Sun Xingming. 2017. Residual domain dictionary learning for compressed sensing video recovery. Multimedia Tools Applic. 76, 7 (2017), 10083--10096. Google Scholar
Digital Library
- Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, Thomas Wiegand et al. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 1649--1668. Google Scholar
Digital Library
- Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. 2017. Lossy image compression with compressive autoencoders. In International Conference on Learning Representations (ICLR’17). 1--19.Google Scholar
- George Toderici, Sean M. O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. 2015. Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015).Google Scholar
- George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, and Michele Covell. 2017. Full resolution image compression with recurrent neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5435--5443.Google Scholar
Cross Ref
- Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves et al. 2016. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems. 4790--4798. Google Scholar
Digital Library
- Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. In Advances in Neural Information Processing Systems. 613--621. Google Scholar
Digital Library
- Gregory K. Wallace. 1992. The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38, 1 (1992), xviii--xxxiv. Google Scholar
Digital Library
- Qingzhu Wang, Mengying Wei, Xiaoming Chen, and Zhuang Miao. 2018. Joint encryption and compression of 3D images based on tensor compressive sensing with non-autonomous 3D chaotic system. Multimedia Tools Applic. 77, 2 (2018), 1715--1734. Google Scholar
Digital Library
- Tingting Wang, Mingjin Chen, and Hongyang Chao. 2017. A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In Data Compression Conference (DCC). IEEE, 410--419.Google Scholar
Cross Ref
- Chao Yuan Wu, Nayan Singhal, and Philipp Krähenbühl. 2018. Video compression through image interpolation. In European Conference on Computer Vision (ECCV’18). Springer, 425--440.Google Scholar
Cross Ref
- Yimin Yang, Q. M. Jonathan Wu, and Yaonan Wang. 2016. Autoencoder with invertible functions for dimension reduction and image reconstruction. IEEE Trans. Syst. Man Cybern. Syst. PP, 99 (2016), 1--15.Google Scholar
- Li Yingzhen and Stephan Mandt. 2018. Disentangled sequential autoencoder. In International Conference on Machine Learning (ICML’18). 5656--5665.Google Scholar
- Jia Zhang, Sam Tak Wu Kwong, Tiesong Zhao, and Horace Ho Shing Ip. 2018. Complexity control in HEVC intra coding for industrial video applications. IEEE Trans. Industr. Inform. PP, 99 (2018), 1--1.Google Scholar
Index Terms
3D Tensor Auto-encoder with Application to Video Compression
Recommendations
Sparse Auto-encoder with Smoothed $$l_1$$l1 Regularization
Improving the performance on data representation of an auto-encoder could help to obtain a satisfying deep network. One of the strategies to enhance the performance is to incorporate sparsity into an auto-encoder. Fortunately, sparsity for the auto-...
Tensor compressed video sensing reconstruction by combination of fractional-order total variation and sparsifying transform
High reconstructed performance compressed video sensing (CVS) with low computational complexity and memory requirement is very challenging. In order to reconstruct the high quality video frames with low computational complexity, this paper proposes a ...
Tensor Train Construction From Tensor Actions, With Application to Compression of Large High Order Derivative Tensors
We present a method for converting tensors into the tensor train format based on actions of the tensor as a vector-valued multilinear function. Existing methods for constructing tensor trains require access to “array entries” of the tensor and are ...






Comments