Abstract
QR decomposition (QRD), a matrix decomposition algorithm widely used in embedded application domain, can be realized in a large number of valid processing sequences that differ significantly in the number of memory accesses and computations, and hence the overall implementation energy. With modern low-power embedded processors evolving toward register files with wide memory interfaces and vector functional units (FUs), data flow in these algorithms needs to be carefully devised to efficiently utilize the costly wide memory accesses and the vector FUs. In this article, we present an energy-efficient data flow transformation strategy for the Givens rotation--based QRD.
- Cadence. 2012. RTL Compiler. Available at http://www.cadence.com/.Google Scholar
- D. Cescato and H. Bolcskei. 2011. Algorithms for interpolation-based QR decomposition in MIMO-OFDM systems. IEEE Transactions on Signal Processing 59, 4, 1719--1733. Google Scholar
Digital Library
- Y. Chien and K.-S. Fu. 1967. On the generalized Karhunen-Loeve expansion (Corresp.). IEEE Transactions on Information Theory 13, 3, 518--520. Google Scholar
Digital Library
- Alan George, Joseph W. Liu, and Ng Esmond. 1984. Row ordering schemes for sparse Givens transformations. Linear Algebra and Its Applications 61, 55--81.Google Scholar
Cross Ref
- Marc Hofmann and Erricos John Kontoghiorghes. 2006. Pipeline Givens sequences for computing the QR decomposition on a EREW PRAM. Parallel Computing 32, 3, 222--230. Google Scholar
Digital Library
- Zheng-Yu Huang and Pei-Yun Tsai. 2011. Efficient implementation of QR decomposition for gigabit MIMO-OFDM systems. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 10, 2531--2542.Google Scholar
Cross Ref
- Yin-Tsung Hwang and Wei-Da Chen. 2008. A low complexity complex QR factorization design for signal detection in MIMO OFDM systems. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'08). 932--935.Google Scholar
- Min-Woo Lee, Ji-Hwan Yoon, and Jongsun Park. 2012. High-speed tournament Givens rotation-based QR decomposition architecture for MIMO receiver. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'12). 21--24.Google Scholar
Cross Ref
- K.-H. Lin, R. C. Chang, C.-L. Huang, F.-C. Chen, and S.-C. Lin. 2008. Implementation of QR decomposition for MIMO-OFDM detection systems. In Proceedings of the International Conference on Electronics, Circuits, and Systems (ICECS'08). 57--60.Google Scholar
Cross Ref
- L. Ma, K. Dickson, J. McAllister, and J. McCanny. 2011. QR decomposition-based matrix inversion for high performance embedded MIMO receivers. IEEE Transactions on Signal Processing 59, 4, 1858--1867. Google Scholar
Digital Library
- A. Maltsev, V. Pestretsov, R. Maslennikov, and A. Khoryaev. 2006. Triangular systolic array with reduced latency for QR-decomposition of complex matrices. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'06). 385--388.Google Scholar
- K. V. Mardia, J. T. Kent, and J. M. Bibby. 1979. Multivariate Analysis. Academic Press, New York, NY.Google Scholar
- Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT'02). 166--173.Google Scholar
- N. Park, B. Hong, and V. K. Prasanna. 2003. Tiling, block data layout, and memory hierarchy performance. IEEE Transactions on Parallel and Distributed System 14, 7, 640--654. Google Scholar
Digital Library
- W. K. Pratt. 1975. Digital Image Processing. John Wiley & Sons, New York, NY.Google Scholar
- Jochen Rust, Frank Ludwig, and Steffen Paul. 2013. Low complexity QR-decomposition architecture using the logarithmic number system. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'13). 97--102. Google Scholar
Digital Library
- Namita Sharma, Tom Vander Aa, Prashant Agrawal, Praveen Raghavan, Preeti Ranjan Panda, and Francky Catthoor. 2013. Data memory optimization in LTE downlink. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'13). 2610--2614.Google Scholar
Cross Ref
- Namita Sharma, Preeti Ranjan Panda, Francky Catthoor, Praveen Raghavan, and Tom Vander Aa. 2015. Array interleaving—an energy-efficient data layout transformation. ACM Transactions on Design Automation of Electronic Systems 20, 3, 44. Google Scholar
Digital Library
- Namita Sharma, Preeti Ranjan Panda, Min Li, Prashant Agrawal, and Francky Catthoor. 2014. Energy efficient data flow transformation for Givens rotation based QR decomposition. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'14). 1--4. Google Scholar
Digital Library
- C. K. Singh, S. H. Prasad, and P. T. Balsara. 2007. VLSI architecture for matrix inversion using modified Gram-Schmidt based QR decomposition. In Proceedings of the International Conference on Embedded Systems (VLSI Design'07). 836--841. Google Scholar
Digital Library
- Synopsys. 2006. PrimePower. Available at http://www.synopsys.com/.Google Scholar
- Tom Vander Aa, Martin Palkovic, Matthias Hartmann, Praveen Raghavan, Antoine Dejonghe, and Liesbet Van der Perre. 2011. A multi-threaded coarse-grained array processor for wireless baseband. In Proceedings of the IEEE 9th Symposium on Application Specific Processors (SASP'11). 102--107. Google Scholar
Digital Library
Index Terms
Data Flow Transformation for Energy-Efficient Implementation of Givens Rotation--Based QRD
Recommendations
Energy efficient data flow transformation for givens rotation based QR decomposition
DATE '14: Proceedings of the conference on Design, Automation & Test in EuropeQR Decomposition (QRD) is a typical matrix decomposition algorithm that shares many common features with other algorithms such as LU and Cholesky decomposition. The principle can be realized in a large number of valid processing sequences that differ ...
Real and integer Wedderburn rank reduction formulas for matrix decompositions
The Wedderburn rank reduction formula is a powerful method for developing matrix factorizations and many fundamental numerical linear algebra processes. We present a new interpretation of the Wedderburn rank reduction formula and its associated ...
New Matrix Inversion Algorithms Based on Exchange Method
This paper derives a set of new algorithms based on the exchange method for the computation of matrix inverses including nonsingular, symmetric nonsingular, and rectangular matrices. The symmetric matrix inversion algorithm can save up to 50 percent of ...






Comments