Abstract
Real-time ultra-high-definition (UHD) video applications have attracted much attention, where the encoder side urgently demands the high-throughput two-dimensional (2D) transform hardware implementation for the latest video coding standards. This article proposes an effective acceleration method for transform algorithm in UHD intra coding based on the third generation of audio video coding standard (AVS3). First, by conducting detailed statistical analysis, we devise an efficient hardware-friendly transform algorithm that can reduce running cycles and resource consumption remarkably. Second, to implement multiplierless computation for saving resources and power, a series of shift-and-add unit (SAU) hardwares are investigated to have much less adoptions of shifters and adders than the existing methods. Third, different types of hardware acceleration methods, including calculation pipelining, logical-loop unrolling, and module-level parallelism, are designed to efficaciously support the data-intensive high frame-rate 8K UHD video coding. Finally, due to the scarcity of 8K video sources, we also provide a new dataset for the performance verification. Experimental results demonstrate that our proposed method can effectively fulfill the real-time 8K intra encoding at beyond 60 fps, with very negligible loss on rate-distortion (R-D) performance, which is averagely 0.98% Bjontegaard-Delta Bit-Rate (BD-BR).
- [1] 2021. uavs3e. Retrieved from https://github.com/uavs3/uavs3e.Google Scholar
- [2] . 2018. Delay-sensitive video computing in the cloud: A survey. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3s (
June 2018).DOI :Google ScholarDigital Library
- [3] . 1974. Discrete cosine transform. IEEE Trans. Comput. 100, 1 (1974), 90–93.Google Scholar
Digital Library
- [4] . 2016. Real time all intra HEVC HD encoder on FPGA. In IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 191–195.
DOI :Google ScholarCross Ref
- [5] . 2001. Calculation of average PSNR differences between RD-curves. VCEG-M33 (2001). https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc.Google Scholar
- [6] . 2021. Developments in international video coding standardization after AVC, with an overview of Versatile Video Coding (VVC). Proc. IEEE 109, 9 (2021), 1463–1493.
DOI: Google ScholarCross Ref
- [7] . 2021. Efficient fast algorithm and parallel hardware architecture for intra prediction of AVS3. In IEEE International Symposium on Circuits and Systems (ISCAS). 1–5.
DOI :Google ScholarCross Ref
- [8] . 2018. An optimized architecture of HEVC core transform using real-valued DCT coefficients. IEEE Trans. Circ. Syst. II: Express Briefs 65, 12 (2018), 2052–2056.
DOI :Google ScholarCross Ref
- [9] . 2017. Efficient intra transform unit partitioning for high efficiency video coding. In IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW). 215–216.
DOI :Google ScholarCross Ref
- [10] . 2015. High-performance multiplierless DCT architecture for HEVC. In 19th International Symposium on VLSI Design and Test. 1–5.
DOI :Google ScholarCross Ref
- [11] . 2021. Fast intra mode decision algorithm for versatile video coding. IEEE Trans. Multimedia 24 (2021), 400–414.
DOI :Google ScholarDigital Library
- [12] . 2017. An efficient framework for compressed domain watermarking in P frames of High-Efficiency Video Coding (HEVC)–encoded video. ACM Trans. Multimedia Comput. Commun. Appl. 13, 1 (
Jan. 2017).DOI :Google ScholarDigital Library
- [13] . 2014. Cost-effective hardware-sharing design of fast algorithm based multiple forward and inverse transforms for H.264/AVC, MPEG-1/2/4, AVS, and VC-1 video encoding and decoding applications. IEEE Trans. Circ. Syst. Vid. Technol. 24, 4 (2014), 714–720.
DOI :Google ScholarCross Ref
- [14] . 2011. Fast multiple inverse transforms with low-cost hardware sharing design for multistandard video decoding. IEEE Trans. Circ. Syst. II: Express Briefs 58, 8 (2011), 517–521.
DOI :Google ScholarCross Ref
- [15] . 2020. Performance and computational complexity analysis of coding tools in AVS3. In IEEE International Conference on Multimedia Expo Workshops (ICMEW). 1–6.
DOI :Google ScholarCross Ref
- [16] . 2020. A pipelined 2D transform architecture supporting mixed block sizes for the VVC standard. IEEE Trans. Circ. Syst. Vid. Technol. 30, 9 (2020), 3289–3295.
DOI :Google ScholarDigital Library
- [17] . 2017. Joint machine learning and game theory for rate control in high efficiency video coding. IEEE Trans. Image Process. 26, 12 (2017), 6074–6089.
DOI :Google ScholarDigital Library
- [18] . 2016. DCT coefficient distribution modeling and quality dependency analysis based frame-level bit allocation for HEVC. IEEE Trans. Circ. Syst. Vid. Technol. 26, 1 (2016), 139–153.
DOI :Google ScholarDigital Library
- [19] . 2016. SSIM-based game theory approach for rate-distortion optimized intra frame CTU-Level bit allocation. IEEE Trans. Multimedia 18, 6 (2016), 988–999.
DOI :Google ScholarDigital Library
- [20] . 1990. A fast recursive algorithm for the discrete sine transform. IEEE Trans. Acoust, Speech Sig. Process. 38, 3 (1990), 553–557.Google Scholar
Cross Ref
- [21] . 2021. DCT -II transform hardware-based acceleration for VVC standard. In IEEE International Conference on Design Test of Integrated Micro Nano-Systems (DTS). 1–5.
DOI :Google ScholarCross Ref
- [22] . 2010. Real-time H.264 video encoding in software with fast mode decision and dynamic complexity control. ACM Trans. Multimedia Comput. Commun. Appl. 6, 1 (
Feb. 2010).DOI :Google ScholarDigital Library
- [23] . 2017. Scalable approximate DCT architectures for efficient HEVC-compliant video coding. IEEE Trans. Circ. Syst. Vid. Technol. 27, 8 (2017), 1815–1825.
DOI :Google ScholarDigital Library
- [24] . 2021. Context-adaptive secondary transform for video coding. In IEEE International Conference on Image Processing (ICIP). 2039–2043.
DOI :Google ScholarCross Ref
- [25] . 2019. Hardware acceleration of approximate transform module for the versatile video coding standard. In 27th European Signal Processing Conference (EUSIPCO). 1–5.
DOI :Google ScholarCross Ref
- [26] . 2020. Forward-inverse 2D hardware implementation of approximate transform core for the VVC standard. IEEE Trans. Circ. Syst. Vid. Technol. 30, 11 (2020), 4340–4354.
DOI :Google ScholarCross Ref
- [27] . 2018. Efficient video encoding for automatic video analysis in distributed wireless surveillance systems. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3 (
July 2018).DOI :Google ScholarDigital Library
- [28] . 2004. Scalable variable complexity approximate forward DCT. IEEE Trans. Circ. Syst. Vid. Technol. 14, 11 (2004), 1236–1248.Google Scholar
Digital Library
- [29] . 2015. Efficient SIMD acceleration of DCT and IDCT for high efficiency video coding. In 4th International Conference on Multimedia Technology. CRC Press.Google Scholar
Cross Ref
- [30] . 2016. Content-adaptive display power saving for internet video applications on mobile devices. ACM Trans. Multimedia Comput. Commun. Appl. 12, 5s (
Nov. 2016).DOI :Google ScholarDigital Library
- [31] . 2015. AVS2? Making video coding smarter [standards in a nutshell]. IEEE Sig. Process. Mag. 32, 2 (2015), 172–183.Google Scholar
Cross Ref
- [32] . 2015. Optimizing the transform complexity-quality tradeoff for hardware-accelerated HEVC video coding. In Conference on Design and Architectures for Signal and Image Processing (DASIP). 1–6.
DOI :Google ScholarCross Ref
- [33] . 2014. Efficient integer DCT architectures for HEVC. IEEE Trans. Circ. Syst. Vid. Technol. 24, 1 (2014), 168–178.
DOI :Google ScholarDigital Library
- [34] . 1997. Fast algorithms for DCT-domain image downsampling and for inverse motion compensation. IEEE Trans. Circ. Syst. Vid. Technol. 7, 3 (1997), 468–476.Google Scholar
Digital Library
- [35] . 1999. Discrete-time Signal Processing. Pearson Education India.Google Scholar
Digital Library
- [36] . 2018. Adaptive fractional-pixel motion estimation skipped algorithm for efficient HEVC motion estimation. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (
Jan. 2018).DOI :Google ScholarDigital Library
- [37] . 2020. Frame-level bit allocation optimization based on<!–?Brk?–> video content characteristics for HEVC. ACM Trans. Multimedia Comput. Commun. Appl. 16, 1 (
March 2020).DOI :Google ScholarDigital Library
- [38] . 2018. Fast and multiplierless integer DCT for HEVC. In 3rd IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT). 724–727.
DOI :Google ScholarCross Ref
- [39] . 1999. Modeling DCT coefficients for fast video encoding. IEEE Trans. Circ. Syst. Vid. Technol. 9, 4 (1999), 608–616.Google Scholar
Digital Library
- [40] . 2009. Dynamic bit-width adaptation in DCT: An approach to trade off image quality and computation energy. IEEE Trans. Very Large Scale Integ. Syst. 18, 5 (2009), 787–793.Google Scholar
Digital Library
- [41] . 2021. 8C-B60A 8K Professional Camcorder. Retrieved from https://global.sharp/corporate/news/171107_2.html.Google Scholar
- [42] . 2019. Low-complexity scalable extension of the high-efficiency video coding (SHVC) encoding system. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2 (
June 2019).DOI :Google ScholarDigital Library
- [43] . 2008. Low-cost hardware-sharing architecture of fast 1-D inverse transforms for H.264/AVC and AVS applications. IEEE Trans. Circ. Syst. II: Express Briefs 55, 12 (2008), 1249–1253.
DOI :Google ScholarCross Ref
- [44] . 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 1649–1668.Google Scholar
Digital Library
- [45] . 2014. High Efficiency Video Coding (HEVC): Algorithms and Architectures. Springer Publishing Company, Incorporated. Google Scholar
Cross Ref
- [46] . 2019. AVS Proposal M4772: Implicit selection of transforms for intra coding. Retrieved from ftp://47.93.196.121/Public/avsdoc/1906_Chengdu/contrib/M4772.zip.Google Scholar
- [47] . 2021. AVS3-Part 2 (Video). Retrieved from http://avs.org.cn/AVS3_download/index.asp.Google Scholar
- [48] . 2021. Reference Software for AVS3: High Performance Model. Retrieved from ftp://47.93.196.121/Public/codec/video_code.Google Scholar
- [49] . 2021. Fast mode decision algorithm for intra encoding of the 3rd generation audio video coding standard. In International Conference on Multimedia Modeling. 481–492.Google Scholar
Digital Library
- [50] . 2021. UltraScale Architecture Configurable Logic Block User Guide (UG574). Retrieved from https://www.xilinx.com/support/documentation/user_guides/ug574-ultrascale-clb.pdf.Google Scholar
- [51] . 2021. Ultrascale FPGA Product Selection Guide. Retrieved from https://www.xilinx.com/support/documentation/selection-guides/ultrascale-fpga-product-selection-guide.pdf.Google Scholar
- [52] . 2021. Virtex Ultrascale FPGA. Retrieved from https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale.html.Google Scholar
- [53] . 2021. Vivado Simulator. Retrieved from https://www.xilinx.com/products/design-tools/vivado/simulator.html.Google Scholar
- [54] . 2020. Low-complexity CTU partition structure decision and fast intra mode decision for versatile video coding. IEEE Trans. Circ. Syst. Vid. Technol. 30, 6 (2020), 1668–1682.
DOI :Google ScholarCross Ref
- [55] . 2019. Recent development of AVS video coding standard: AVS3. In Picture Coding Symposium (PCS). IEEE, 1–5.Google Scholar
- [56] . 2015. Low complexity HEVC INTRA coding for high-quality mobile video communication. IEEE Trans. Industr. Inform. 11, 6 (2015), 1492–1504.
DOI :Google ScholarCross Ref
- [57] . 2017. Complexity-based intra frame rate control by jointing inter-frame correlation for high efficiency video coding. J. Vis. Commun. Image Represent. 42, C (
Jan. 2017), 46–64.DOI :Google ScholarDigital Library
- [58] . 2017. Complexity correlation-based CTU-level rate control with direction selection for HEVC. ACM Trans. Multimedia Comput. Commun. Appl. 13, 4 (
Aug. 2017).DOI :Google ScholarDigital Library
Index Terms
Accelerating Transform Algorithm Implementation for Efficient Intra Coding of 8K UHD Videos
Recommendations
An efficient hardware implementation of parallel EBCOT algorithm for JPEG 2000
With the augmentation in multimedia technology, demand for high-speed real-time image compression systems has also increased. JPEG 2000 still image compression standard is developed to accommodate such application requirements. Embedded block coding ...
Cross residual transform for lossless intra-coding for HEVC
A new lossless intra-coding method based on a cross residual transform is applied to the next generation video coding standard HEVC (High Efficiency Video Coding). HEVC includes a multi-directional spatial prediction method to reduce spatial redundancy ...
M-LTW: A Fast and Efficient Non-embedded Intra Video Codec
Advances in Multimedia Information Processing – PCM 2007AbstractIntra video coding is a common way to process video material for applications like professional video editing systems, digital cinema, video surveillance applications, multispectral satellite imaging, HQ video delivery, etc. Most practical intra ...






Comments