skip to main content
research-article

Data Flow Transformation for Energy-Efficient Implementation of Givens Rotation--Based QRD

Authors Info & Claims
Published:13 January 2016Publication History
Skip Abstract Section

Abstract

QR decomposition (QRD), a matrix decomposition algorithm widely used in embedded application domain, can be realized in a large number of valid processing sequences that differ significantly in the number of memory accesses and computations, and hence the overall implementation energy. With modern low-power embedded processors evolving toward register files with wide memory interfaces and vector functional units (FUs), data flow in these algorithms needs to be carefully devised to efficiently utilize the costly wide memory accesses and the vector FUs. In this article, we present an energy-efficient data flow transformation strategy for the Givens rotation--based QRD.

References

  1. Cadence. 2012. RTL Compiler. Available at http://www.cadence.com/.Google ScholarGoogle Scholar
  2. D. Cescato and H. Bolcskei. 2011. Algorithms for interpolation-based QR decomposition in MIMO-OFDM systems. IEEE Transactions on Signal Processing 59, 4, 1719--1733. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Chien and K.-S. Fu. 1967. On the generalized Karhunen-Loeve expansion (Corresp.). IEEE Transactions on Information Theory 13, 3, 518--520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alan George, Joseph W. Liu, and Ng Esmond. 1984. Row ordering schemes for sparse Givens transformations. Linear Algebra and Its Applications 61, 55--81.Google ScholarGoogle ScholarCross RefCross Ref
  5. Marc Hofmann and Erricos John Kontoghiorghes. 2006. Pipeline Givens sequences for computing the QR decomposition on a EREW PRAM. Parallel Computing 32, 3, 222--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zheng-Yu Huang and Pei-Yun Tsai. 2011. Efficient implementation of QR decomposition for gigabit MIMO-OFDM systems. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 10, 2531--2542.Google ScholarGoogle ScholarCross RefCross Ref
  7. Yin-Tsung Hwang and Wei-Da Chen. 2008. A low complexity complex QR factorization design for signal detection in MIMO OFDM systems. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'08). 932--935.Google ScholarGoogle Scholar
  8. Min-Woo Lee, Ji-Hwan Yoon, and Jongsun Park. 2012. High-speed tournament Givens rotation-based QR decomposition architecture for MIMO receiver. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'12). 21--24.Google ScholarGoogle ScholarCross RefCross Ref
  9. K.-H. Lin, R. C. Chang, C.-L. Huang, F.-C. Chen, and S.-C. Lin. 2008. Implementation of QR decomposition for MIMO-OFDM detection systems. In Proceedings of the International Conference on Electronics, Circuits, and Systems (ICECS'08). 57--60.Google ScholarGoogle ScholarCross RefCross Ref
  10. L. Ma, K. Dickson, J. McAllister, and J. McCanny. 2011. QR decomposition-based matrix inversion for high performance embedded MIMO receivers. IEEE Transactions on Signal Processing 59, 4, 1858--1867. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Maltsev, V. Pestretsov, R. Maslennikov, and A. Khoryaev. 2006. Triangular systolic array with reduced latency for QR-decomposition of complex matrices. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'06). 385--388.Google ScholarGoogle Scholar
  12. K. V. Mardia, J. T. Kent, and J. M. Bibby. 1979. Multivariate Analysis. Academic Press, New York, NY.Google ScholarGoogle Scholar
  13. Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT'02). 166--173.Google ScholarGoogle Scholar
  14. N. Park, B. Hong, and V. K. Prasanna. 2003. Tiling, block data layout, and memory hierarchy performance. IEEE Transactions on Parallel and Distributed System 14, 7, 640--654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. K. Pratt. 1975. Digital Image Processing. John Wiley & Sons, New York, NY.Google ScholarGoogle Scholar
  16. Jochen Rust, Frank Ludwig, and Steffen Paul. 2013. Low complexity QR-decomposition architecture using the logarithmic number system. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'13). 97--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Namita Sharma, Tom Vander Aa, Prashant Agrawal, Praveen Raghavan, Preeti Ranjan Panda, and Francky Catthoor. 2013. Data memory optimization in LTE downlink. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'13). 2610--2614.Google ScholarGoogle ScholarCross RefCross Ref
  18. Namita Sharma, Preeti Ranjan Panda, Francky Catthoor, Praveen Raghavan, and Tom Vander Aa. 2015. Array interleaving—an energy-efficient data layout transformation. ACM Transactions on Design Automation of Electronic Systems 20, 3, 44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Namita Sharma, Preeti Ranjan Panda, Min Li, Prashant Agrawal, and Francky Catthoor. 2014. Energy efficient data flow transformation for Givens rotation based QR decomposition. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'14). 1--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. K. Singh, S. H. Prasad, and P. T. Balsara. 2007. VLSI architecture for matrix inversion using modified Gram-Schmidt based QR decomposition. In Proceedings of the International Conference on Embedded Systems (VLSI Design'07). 836--841. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Synopsys. 2006. PrimePower. Available at http://www.synopsys.com/.Google ScholarGoogle Scholar
  22. Tom Vander Aa, Martin Palkovic, Matthias Hartmann, Praveen Raghavan, Antoine Dejonghe, and Liesbet Van der Perre. 2011. A multi-threaded coarse-grained array processor for wireless baseband. In Proceedings of the IEEE 9th Symposium on Application Specific Processors (SASP'11). 102--107. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data Flow Transformation for Energy-Efficient Implementation of Givens Rotation--Based QRD

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!