skip to main content
research-article

Field Programmable Compressor Trees: Acceleration of Multi-Input Addition on FPGAs

Published:01 June 2009Publication History
Skip Abstract Section

Abstract

Multi-input addition occurs in a variety of arithmetically intensive signal processing applications. The DSP blocks embedded in high-performance FPGAs perform fixed bitwidth parallel multiplication and Multiply-ACcumulate (MAC) operations. In theory, the compressor trees contained within the multipliers could implement multi-input addition; however, they are not exposed to the programmer. To improve FPGA performance for these applications, this article introduces the Field Programmable Compressor Tree (FPCT) as an alternative to the DSP blocks. By providing just a compressor tree, the FPCT can perform multi-input addition along with parallel multiplication and MAC in conjunction with a small amount of FPGA general logic. Furthermore, the user can configure the FPCT to precisely match the bitwidths of the operands being summed. Although an FPCT cannot beat the performance of a well-designed ASIC compressor tree of fixed bitwidth, for example, 9×9 and 18×18-bit multipliers/MACs in DSP blocks, its configurable bitwidth and ability to perform multi-input addition is ideal for reconfigurable devices that are used across a variety of applications.

References

  1. Allen, J. R., Kennedy, K., Porterfield, C., and Warren, J. 1983. Conversion of control dependence to data dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming. 177--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Altera Corporation. 2006. Stratix II performance and logic efficiency analysis. White paper. September. http://www.altera.com/.Google ScholarGoogle Scholar
  3. Altera Corporation. 2008a. Stratix II device handbook. http://www.altera.com/.Google ScholarGoogle Scholar
  4. Altera Corporation. 2008b. Stratix III device handbook. http://www.altera.com/.Google ScholarGoogle Scholar
  5. Altera Corporation. 2008c. Stratix IV device handbook. http://www.altera.com/.Google ScholarGoogle Scholar
  6. Beuchat, J.-L. and Tisserand, A. 2002. Small multiplier-based multiplication and division operators for Virtex-II devices. In Proceedings of the 12th International Conference on Field Programmable Logic and Applications. 513--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Betz, V. and Rose, J. 1997. VPR: A new packing, placement, and routing tool for FPGA research. In Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications. 213--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Betz, V., Rose, J., and Marquardt, A. 1999. Architecture and CAD for Deep Submicron FPGAs. Kluwer Academic, Norwell, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brisk, P., Verma, A. K., Ienne, P., and Parandeh-Afshar, H. 2007. Enhancing FPGA performance for arithmetic circuits. In Proceedings of the 44th Design Automation Conference. 404--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cevrero, A., Athanasopoulos, P., Parandeh-Afshar, H., Verma, A. K., Brisk, P., et al. 2008. Architectural improvements for field programmable counter arrays: Enabling efficient synthesis of fast compressor trees on FPGAs. In Proceedings of the 16th International Symposium on Field Programmable Gate Arrays. 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen, C.-Y., Chien, S.-Y., Huang, Y.-W., Chen, T.-C., Wang, T.-C., and Chen, L.-G. 2006. Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circ. Syst.-I 53, 578--593.Google ScholarGoogle ScholarCross RefCross Ref
  12. Cherepacha, D. and Lewis, D. 1996. DP-FPGA: An FPGA architecture optimizated for datapaths. VLSI Des. 4, 329--343.Google ScholarGoogle ScholarCross RefCross Ref
  13. Cosoroaba, A. and Rivoallon, F. 2006. Achieving higher system performance with the Virtex-5 family of FPGAs. White paper: Xilinx Corporation. July. http://www.xilinx.com/.Google ScholarGoogle Scholar
  14. Dadda, L. 1965. Some schemes for parallel multipliers. Alta Frequenza 34, 349--356.Google ScholarGoogle Scholar
  15. Frederick, M. T. and Somani, A. K. 2006. Multi-bit carry chains for high performance reconfigurable fabrics. In Proceedings of the 16th International Conference on Field Programmable Logic and Applications. 1--6.Google ScholarGoogle Scholar
  16. Hauck, S., Hosler, M. M., and Fry, T. W. 2000. High-performance carry chains for FPGAs. IEEE Trans. VLSI Syst. 8, 138--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ho, C. H., Leong, P. H. W., Luk, W., Wilton, S. J. E., and Lopez-Buedo, S. 2006. Virtual embedded blocks: A methodology for evaluating embedded elements in FPGAs. In Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines. 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kaviani, A., Vranisec, D., and Brown, S. 1998. Computational field programmable architecture. In Proceedings of the IEEE Custom Integrated Circuits Conference. 261--264.Google ScholarGoogle Scholar
  19. Kuon, I. and Rose, J. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aided Des. 26, 203--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th International Symposium on Microarchitecture. 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Leijten-Nowak, K. and van Meerbergen, J. L. 2003. An FPGA architecture with enhanced datapath functionality. In Proceedings of the 11th International Symposium on FPGAs. 195--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mirzaei, S., Hosangadi, A., and Kastner, R. 2006. FPGA implementation of high speed FIR filters using add and shift method. In Proceedings of the International Conference on Computer Design. 308--313.Google ScholarGoogle Scholar
  23. Oklobdzija, V. G. and Villeger, D. 1995. Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology. IEEE Trans. VLSI Syst. 3, 292--301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2008a. A novel FPGA logic block for improved arithmetic performance. In Proceedings of the 16th International Symposium on Field Programmable Gate Arrays. 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2008b. Efficient synthesis of compressor trees on FPGAs. In Proceedings of the Asia-South Pacific Design Automation Conference. 138--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Parandeh-Afhsar, H., Brisk, P., and Ienne, P. 2008c. Improving synthesis of compressor trees on FPGAs via integer linear programming. In Proceedings of the International Conference on Design Automation and Test in Europe. 1256--1262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2009. Scalable and low cost design approach for variable block size motion estimation. In Proceedings of the International Symposium on VLSI Design Automation and Test.Google ScholarGoogle Scholar
  28. Poldre, J. and Tammemae, K. 1999. Reconfigurable multiplier for Virtex FPGA family. In Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications. 359--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sriram, S., Brown, K., Defosseux, R., Moerman, F., Paviot, O., Sundararajan, V., and Gatherer, A. 2005. A 64 channel programmable receiver chip for 3G wireless infrastructure. In Proceedings of the IEEE Custom Integrated Circuits Conference. 59--62.Google ScholarGoogle Scholar
  30. Stelling, P. F. and Oklobdzija, V. J. 1996. Design strategies for optimal hybrid final adders in a parallel multiplier. J. VLSI Signal Process. 14, 321--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Stenzel, W. J., Kubitz, W. J., and Garcia, G. H. 1977. A compact high-speed parallel multiplication scheme. IEEE Trans. Comput. C-26, 948--957. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Verma, A. K., Brisk, P., and Ienne, P. 2008. Data-Flow transformations to maximize the use of carry-save representation in arithmetic circuits. IEEE Trans. Comput.-Aided Des. 27, 1761--1774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wallace, C. S. 1964. A suggestion for a fast multiplier. IEEE Trans. Elec. Comput. 13, 14--17.Google ScholarGoogle ScholarCross RefCross Ref
  34. Xilinx Corporation. 2008a. Virtex-5 FPGA XtremeDSP design considerations. http://www.xilinx.com/.Google ScholarGoogle Scholar
  35. Xilinx Corporation. 2008b. Virtex-5 user guide. http://www.xilinx.com/.Google ScholarGoogle Scholar
  36. Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., and Troxel, B. 2002. A hybrid ASIC and FPGA architecture. In Proceedings of the International Conference on Computer-Aided Design. 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Field Programmable Compressor Trees: Acceleration of Multi-Input Addition on FPGAs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!