Abstract
Multiple input multiple output (MIMO) with orthogonal frequency division multiplexing (OFDM) systems typically use orthogonal-triangular (QR) decomposition. In this article, we present an enhanced systolic array architecture to realize QR decomposition based on the Givens rotation (GR) method for a 4 × 4 real matrix. The coordinate rotation digital computer (CORDIC) algorithm is adopted and modified to speed up and simplify the process of GR. To verify the function and evaluate the performance, the proposed architectures are validated on a Virtex 5 FPGA development platform. Compared to a commercial implementation of vectoring CORDIC, the enhanced vectoring CORDIC is presented that uses 37.7% less hardware resources, dissipates 71.6% less power, and provides a 1.8 times speedup while maintaining the same computation accuracy. The enhanced QR systolic array architecture based on the enhanced vectoring CORDIC saves 24.5% in power dissipation, provides a factor of 1.5-fold improvement in throughput, and the hardware efficiency is improved 1.45-fold with no accuracy penalty when compared to our previously proposed QR systolic array architecture.
- Ray Andraka. 1998. A survey of CORDIC algorithms for FPGAs based computers. In Proceedings of the 1998 ACM/SIGDA 6th International Symposium on Field Programmable Gate Arrays (FPGA’98). ACM, New York, NY, 191--200. Google Scholar
Digital Library
- S. Aslan, S. Niu, and J. Saniie. 2012. FPGA implementation of fast QR decomposition based on Givens rotation. In Proceedings of the 55th International Midwest Symposium on Circuits and Systems (MWSCAS’12). IEEE, Los Alamitos, CA, 470--473.Google Scholar
- Dongdong Chen and Mihai Sima. 2011. Fixed-point CORDIC-based QR decomposition by Givens rotations on FPGA. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig’11). IEEE, Los Alamitos, CA, 327--332. Google Scholar
Digital Library
- Kyongkuk Cho and Dongweon Yoon. 2002. On the general BER expression of one- and two-dimensional amplitude modulations. IEEE Transactions on Communications 50, 7, 1074--1080.Google Scholar
Cross Ref
- Florent De Dinechin, Matei Istoan, and Guillaume Sergent. 2013. Fixed-point trigonometric functions on FPGAs. In Proceedings of the 4th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies. 1--6.Google Scholar
- H. Ekstrom, A. Furuskar, J. Karlsson, M. Meyer, S. Parkvall, J. Torsner, and M. Wahiqvist. 2006. Technical solutions for the 3G long-term evolution. IEEE Communications Magazine 44, 3, 38--45. Google Scholar
Digital Library
- Lajos L. Hanzo and Thomas Keller. 2006. OFDM and MC-CDMA: A Primer. Wiley-IEEE Press, Chichester West Sussex, England.Google Scholar
- IEEE. 1999. Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. IEEE, New York, NY.Google Scholar
- Anita Jain and Kavita Khare. 2013. Hardware efficient scaling free vectoring and rotational CORDIC for DSP applications. International Journal of VLSI Design and Communication Systems 4, 3, 15--22.Google Scholar
Cross Ref
- Marjan Karkooti, Joseph R. Cavallaro, and Chris Dick. 2005. FPGA implementation of matrix inversion using QRD-RLS algorithm. In Proceedings of the 39th Asilomar Conference on Signals, Systems, and Computers.Google Scholar
Cross Ref
- Andrew Kerr, Dan Campbell, and Mark Richards. 2009. QR decomposition on GPUs. In Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU-2). ACM, New York, NY, 71--78. Google Scholar
Digital Library
- Iput Heri Kurniawan, Ji-Hwan Yoon, and Jongsun Park. 2013. Multidimentional Householder based high-speed QR decomposition architecture for MIMO receviers. In Proceedings of the 2013 International Symposium on Circuits and Systems (ISCAS’13). IEEE, Los Alamitos, CA, 2159--2162.Google Scholar
Cross Ref
- Qinghua Li, Guangjie Li, Wookbong Lee, and Moon Il Lee. 2010. MIMO techniques in WiMAX and LTE: A future overview. IEEE Communications Magazine 48, 5, 86--92. Google Scholar
Digital Library
- P. Luethi, C. Studer, S. Duetsch, E. Zgraggen, H. Kaeslin, N. Felber, and W. Fichtner. 2008. Gram-Schmidt-based QR decomposition for MIMO detection: VLSI implementation and comparison. In Proceedings of the Asia Pacific Conference on Circuits and Systems (APCCAS’08). IEEE, Los Alamitos, CA, 830--833.Google Scholar
- K. Maharatna, S. Banerjee, E. Grass, M. Krstic, and A. Troya. 2005. Modified virtually scaling-free adaptive CORDIC rotator algorithm and architecture. IEEE Transactions on Circuits and Systems 15, 11, 1463--1474. Google Scholar
Digital Library
- Michael P. McGraw-Herdeg, Douglas P. Enright, and B. Scott Michel. 2007. Benchmarking the NVIDIA 8800GTX with the CUDA development platform. In Proceedings of High Performance Embedded Computing (HPEC’07).Google Scholar
- P. K. Meher, J. Valls, T.-B. Juang, K. Sridharan, and K. Maharatna. 2009. 50 years of CORDIC: Algorithms, architectures, and applications. IEEE Transactions on Circuits and Systems I: Regular Papers 56, 9, 1893--1907. Google Scholar
Digital Library
- Yuanxi Peng, Manuel Saldana, Chris Madill, Xiaofeng Zou, and Paul Chow. 2014. Benefits of adding hardware support for broadcast and reduce operations in MPSoC applications. ACM Transactions on Reconfigurable Technology and Systems 7, 3, 17:1--17:23. Google Scholar
Digital Library
- J. E. Volder. 1959. The CORDIC trigonometric computing technique. IRE Transactions on Electronic Computers 8, 3, 330--334.Google Scholar
Cross Ref
- C. R. Wana and D. J. Evans. 1994. A systolic array architecture for QR decomposition of block structured sparse systems. Parallel Computing 20, 6, 903--914. Google Scholar
Digital Library
- Shaoyun Wang and Earl E. Swartzlander. 1996. The critically damped CORDIC algorithm for QR decomposition. In Proceedings of the 30th Asilomar Conference on Signals, Systems, and Computers. IEEE, Los Alamitos, CA, 908--911.Google Scholar
- Xilinx Inc. COREGen. 2015. Xilinx CORE Generator System. Retrieved November 23, 2015, from http://www.xilinx.com/tools/coregen.htm.Google Scholar
- Xilinx Inc. Xpower. 2015. XPower. Retrieved November 23, 2015, from http://www.xilinx.com/products/des ign_tools/logic_design/verification/xpo wer.htm.Google Scholar
- Jianfeng Zhang, Paul Chow, and Hengzhu Liu. 2014. An efficient FPGA implementation of QR decomposition using a novel systolic array architecture based on enhanced vectoring CORDIC. In Proceedings of the International Conference on Field Programmable Technology (FPT’14). IEEE, Los Alamitos, CA, 123--130.Google Scholar
Cross Ref
- J. Zhang, H. Liu, W. Hu, D. Liu, and B. Zhang. 2012. Adaptive recoding CORDIC. IEICE Electronics Express 9, 8, 765--771.Google Scholar
Cross Ref
Index Terms
CORDIC-Based Enhanced Systolic Array Architecture for QR Decomposition
Recommendations
An Enhanced Adaptive Recoding Rotation CORDIC
Special Section on the 2014 International Symposium on Applied Reconfigurable ComputingThe Conventional Coordinate Rotation Digital Computer (CORDIC) algorithm has been widely used in many applications, particularly in Direct Digital Frequency Synthesizers (DDS) and Fast Fourier Transforms (FFT). However, CORDIC is constrained by the ...
A truly two-dimensional systolic array FPGA implementation of QR decomposition
We have implemented a two-dimensional systolic array QR decomposition on a Xilinx Virtex5 FPGA using the Givens rotation algorithm. QR decomposition is a key step in many DSP applications including sonar beamforming, channel equalization, and 3G ...
Analysis of a QR Algorithm for Computing Singular Values
We extend the Golub--Kahan algorithm for computing the singular value decomposition of bidiagonal matrices to triangular matrices $R$. Our algorithm avoids the explicit formation of $R^TR$ or $RR^T$.
We derive a relation between left and right ...






Comments