Abstract
Computational load of motion estimation in advanced video coding (AVC) standard is significantly high and even worse for HDTV and super-resolution sequences. In this article, a video processing algorithm is dynamically mapped onto a new parallel reconfigurable computing (PRC) architecture which consists of multiple dynamic reconfigurable computing (DRC) units. First, we construct a directed acyclic graph (DAG) to represent video coding algorithms in which motion estimation is the focus. A novel parallel partition approach is then proposed to map motion estimation DAG onto the multiple DRC units in a PRC system. This partitioning algorithm is capable of design optimization of parallel processing reconfigurable systems for a given number of processing elements in different search ranges. This speeds up the video processing with minimum sacrifice.
- Bjontegarrd, G. 2001. Calculation of average psnr difference between rd-curve. In Proceedings of the 13th VCEG Meeting.Google Scholar
- Chen, L. F. and Lai, Y. K. 2004. VLSI architecture of the reconfigurable computing engine for digital signal processing applications. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’04). 937--940.Google Scholar
- Chen, Y.-K., Chhugani, J., Hughes, C. J., Kim, D., Kumar, S., Lee, V., Lin, A., Nguyen, A. D., Sifakis, E., and Smelyanskiy, M. 2007. High-Performance physical simulations on next-generation architecture with many cores. Intel. Techn. J. 1, 3, 251--262.Google Scholar
- Chen, Z., Song, Y., Ikenaga, T., and Goto, S. 2008. Adaptive search range algorithm for variable block size motion estimation in H.264/AVC. IEICE Fundam. E91-A, 4. Google Scholar
Digital Library
- Fujii, T., Furuta, K., Motomura, M., Nomura, M., Mizuno, M., Anjo, K., Wakabayashi, K., Hirota, Y., Nakazawa, Y., Ito, H., and Yamashina, M. 1999. A dynamically reconfigurable logic engine with a multiconfiguration/multimode unified cell architecture. In Proceedings of the IEEE International Solid-State Circuits Conference. 364--365.Google Scholar
- Jiang, Y. C. and Wang, J. F. 2007. Temporal partitioning data flow graphs for dynamically reconfigurable computing. IEEE Trans. VLSI 15, 12, 1351--1361. Google Scholar
Digital Library
- Joint Video Team of ISO/IEC. 2012. H.264/14496-10 AVC Reference Software Manual. www.scribd.com/doc/48732371/JM-Reference-Software-Manual-JVT-AE010.Google Scholar
- Kiefer, K. 2007. Motion estimation with Intel streaming simd extensions 4 (Intel sse4). Intel Software Solutions Group.Google Scholar
- Krishna, R., Mahlke, S., and Austin, T. 2004. Memory system design space exploration for low-power, real time speech recognition. In Proceedings of the IEEE International Conference on Hardware/Software Codesign and System Synthesis. 140--145. Google Scholar
Digital Library
- Kuhn, P. 1999. Algorithm, Complexity Analysis and VLSI Architecture for MPEG-4 Motion Estimation. Kluwer. Google Scholar
Digital Library
- Lai, Y. K., Chen, L. F., and Chen, J. C. 2006. A reconfigurable computing processor core for multimedia system-on-chip applications. Japan. J. Appl. Phys. 45, 4B, 3336--3342.Google Scholar
Cross Ref
- Li, E., Li, W., Tong, X., Li, J., Chen, Y., Wang, T., Wang, P. P., Hu, W., Du, Y., Zhang, Y., and Chen, Y.-K. 2008. Accelerating video-mining application using many small general-purpose cores. IEEE Micro 28, 5, 8--21. Google Scholar
Digital Library
- Maestre, R., Kurdahi, F. J., Fernandez, M., Hermida, R., Bagherzadeh, N., and Singh, H. 2001. Kernel scheduling techniques for efficient solution space exploration in reconfigurable computing. J. Syst. Archit. 47, 277--292. Google Scholar
Digital Library
- Mehdipour, F., Saheb Zamani, M., and Sedighi, M. 2006. An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems. Int. J. Microprocess. Microsyst. 30, 1, 52--62.Google Scholar
Cross Ref
- Motomura, M., Aimoto, Y., Shibayama, A., Yabe, Y., and Yamashina, M. 1997. An embedded dram-fpga chip with instantaneous logic reconfiguration. In Proceedings of the Symposium on VLSI Circuits. 55--56.Google Scholar
- Schmit, H., Whelihan, D., Tsai, A., Moe, M., Levine, B., and Taylor, R. R. 2002. PipeRench: A virtualized programmable datapath in 0.18 micron technology. In Proceedings of the IEEE Custom Integrated Circuits Conference. 63--66.Google Scholar
- Schmit, H. 2007. Programmable pipeline fabric utilizing partially global configuration buses. US Patent 7263602, Carnegie Mellon University.Google Scholar
- Singh, H., Lu, G., Lee, M., Kurdahi, F. J., Bagherzadeh, N., Filho, E., and Maestre, R. 2000. MorphoSys: Case study of a reconfigurable computing system targeting multimedia applications. In Proceedings of the ACM/IEEE Design Automation Conference (DAC’00). 573--578. Google Scholar
Digital Library
- Stallings, W. 2003. Computer Organization and Architecture: Designing for Performance. Pearson Education. Google Scholar
Digital Library
- Vissers, K. A. 2003. Parallel processing architectures for reconfigurable systems. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibtion (DATE’03). 396--397. Google Scholar
Digital Library
- Wiegand, T., Sullivan, G. J., Bjntegaard, G., and Luthra, A. 2007. Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Trans. Circ. Syst. Video Technol. 17, 9. Google Scholar
Digital Library
- Wu, G. M., Lin, J. M., and Chang, Y. W. 2001. Generic ilp-based approaches for time-multiplexed fpga partitioning. IEEE Trans. Comput. Aided Des. 20, 10, 1266--1274. Google Scholar
Digital Library
- Yagi, H., Rosenstiel, W., Engblom, J., Andrews, J., Vissers, K., and Serughetti, M. 2009. The wild west: Conquest of complex hardware-dependent software design. In Proceedings of the 46th ACM/IEEE Design Automation Conference (DAC’09). 878--879. Google Scholar
Digital Library
- Yoshizawa, S., Miyanaga, Y., and Wada, N. 2002. A low-power vlsi design of a me-based speech recognition system. In Proceedings of the 45th Midwest Symposium on Circuits and Systems (MWSCAS’02). 489--492.Google Scholar
- Zhu, C., Lin, X., and Chau, L.-P. 2002. Hexagonal-Based search pattern for fast block motion estimation. IEEE Trans. Circ. Syst. Video Technol. 12, 15, 349--355. Google Scholar
Digital Library
Index Terms
Parallel Reconfigurable Computing-Based Mapping Algorithm for Motion Estimation in Advanced Video Coding
Recommendations
A Mixed-Grained Reconfigurable Computing Platform for Multiple-Standard Video Decoding (Abstract Only)
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysA mixed-grained reconfigurable computing platform targeting multiple-standard video decoding is proposed in this paper. The platform integrates eight coarse-grained Reconfigurable Processing Units (RPUs), each of which consists of 16×16 multi-functional ...
Parallel Linear Hashtable Motion Estimation Algorithm for Parallel Video Processing
PARELEC '06: Proceedings of the international symposium on Parallel Computing in Electrical EngineeringThis paper presents a parallel Linear Hashtable Motion Estimation Algorithm (LHMEA). Most parallel video compression algorithms focus on Group of Picture (GOP). Based on LHMEA we proposed earlier [1][2], we developed a parallel motion estimation ...
Video coding using hybrid motion compensation
ICIP '97: Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1We propose a novel video coding scheme to improve the performance of established block-based motion compensation codecs such as MPEG, H261, and H263. The proposed method is a hybrid scheme which introduces model-based global motion compensation as a pre-...






Comments