Abstract
Multi-input addition occurs in a variety of arithmetically intensive signal processing applications. The DSP blocks embedded in high-performance FPGAs perform fixed bitwidth parallel multiplication and Multiply-ACcumulate (MAC) operations. In theory, the compressor trees contained within the multipliers could implement multi-input addition; however, they are not exposed to the programmer. To improve FPGA performance for these applications, this article introduces the Field Programmable Compressor Tree (FPCT) as an alternative to the DSP blocks. By providing just a compressor tree, the FPCT can perform multi-input addition along with parallel multiplication and MAC in conjunction with a small amount of FPGA general logic. Furthermore, the user can configure the FPCT to precisely match the bitwidths of the operands being summed. Although an FPCT cannot beat the performance of a well-designed ASIC compressor tree of fixed bitwidth, for example, 9×9 and 18×18-bit multipliers/MACs in DSP blocks, its configurable bitwidth and ability to perform multi-input addition is ideal for reconfigurable devices that are used across a variety of applications.
- Allen, J. R., Kennedy, K., Porterfield, C., and Warren, J. 1983. Conversion of control dependence to data dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming. 177--189. Google Scholar
Digital Library
- Altera Corporation. 2006. Stratix II performance and logic efficiency analysis. White paper. September. http://www.altera.com/.Google Scholar
- Altera Corporation. 2008a. Stratix II device handbook. http://www.altera.com/.Google Scholar
- Altera Corporation. 2008b. Stratix III device handbook. http://www.altera.com/.Google Scholar
- Altera Corporation. 2008c. Stratix IV device handbook. http://www.altera.com/.Google Scholar
- Beuchat, J.-L. and Tisserand, A. 2002. Small multiplier-based multiplication and division operators for Virtex-II devices. In Proceedings of the 12th International Conference on Field Programmable Logic and Applications. 513--522. Google Scholar
Digital Library
- Betz, V. and Rose, J. 1997. VPR: A new packing, placement, and routing tool for FPGA research. In Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications. 213--222. Google Scholar
Digital Library
- Betz, V., Rose, J., and Marquardt, A. 1999. Architecture and CAD for Deep Submicron FPGAs. Kluwer Academic, Norwell, MA. Google Scholar
Digital Library
- Brisk, P., Verma, A. K., Ienne, P., and Parandeh-Afshar, H. 2007. Enhancing FPGA performance for arithmetic circuits. In Proceedings of the 44th Design Automation Conference. 404--409. Google Scholar
Digital Library
- Cevrero, A., Athanasopoulos, P., Parandeh-Afshar, H., Verma, A. K., Brisk, P., et al. 2008. Architectural improvements for field programmable counter arrays: Enabling efficient synthesis of fast compressor trees on FPGAs. In Proceedings of the 16th International Symposium on Field Programmable Gate Arrays. 181--190. Google Scholar
Digital Library
- Chen, C.-Y., Chien, S.-Y., Huang, Y.-W., Chen, T.-C., Wang, T.-C., and Chen, L.-G. 2006. Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circ. Syst.-I 53, 578--593.Google Scholar
Cross Ref
- Cherepacha, D. and Lewis, D. 1996. DP-FPGA: An FPGA architecture optimizated for datapaths. VLSI Des. 4, 329--343.Google Scholar
Cross Ref
- Cosoroaba, A. and Rivoallon, F. 2006. Achieving higher system performance with the Virtex-5 family of FPGAs. White paper: Xilinx Corporation. July. http://www.xilinx.com/.Google Scholar
- Dadda, L. 1965. Some schemes for parallel multipliers. Alta Frequenza 34, 349--356.Google Scholar
- Frederick, M. T. and Somani, A. K. 2006. Multi-bit carry chains for high performance reconfigurable fabrics. In Proceedings of the 16th International Conference on Field Programmable Logic and Applications. 1--6.Google Scholar
- Hauck, S., Hosler, M. M., and Fry, T. W. 2000. High-performance carry chains for FPGAs. IEEE Trans. VLSI Syst. 8, 138--147. Google Scholar
Digital Library
- Ho, C. H., Leong, P. H. W., Luk, W., Wilton, S. J. E., and Lopez-Buedo, S. 2006. Virtual embedded blocks: A methodology for evaluating embedded elements in FPGAs. In Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines. 35--44. Google Scholar
Digital Library
- Kaviani, A., Vranisec, D., and Brown, S. 1998. Computational field programmable architecture. In Proceedings of the IEEE Custom Integrated Circuits Conference. 261--264.Google Scholar
- Kuon, I. and Rose, J. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aided Des. 26, 203--215. Google Scholar
Digital Library
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th International Symposium on Microarchitecture. 330--335. Google Scholar
Digital Library
- Leijten-Nowak, K. and van Meerbergen, J. L. 2003. An FPGA architecture with enhanced datapath functionality. In Proceedings of the 11th International Symposium on FPGAs. 195--204. Google Scholar
Digital Library
- Mirzaei, S., Hosangadi, A., and Kastner, R. 2006. FPGA implementation of high speed FIR filters using add and shift method. In Proceedings of the International Conference on Computer Design. 308--313.Google Scholar
- Oklobdzija, V. G. and Villeger, D. 1995. Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology. IEEE Trans. VLSI Syst. 3, 292--301. Google Scholar
Digital Library
- Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2008a. A novel FPGA logic block for improved arithmetic performance. In Proceedings of the 16th International Symposium on Field Programmable Gate Arrays. 171--180. Google Scholar
Digital Library
- Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2008b. Efficient synthesis of compressor trees on FPGAs. In Proceedings of the Asia-South Pacific Design Automation Conference. 138--143. Google Scholar
Digital Library
- Parandeh-Afhsar, H., Brisk, P., and Ienne, P. 2008c. Improving synthesis of compressor trees on FPGAs via integer linear programming. In Proceedings of the International Conference on Design Automation and Test in Europe. 1256--1262. Google Scholar
Digital Library
- Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2009. Scalable and low cost design approach for variable block size motion estimation. In Proceedings of the International Symposium on VLSI Design Automation and Test.Google Scholar
- Poldre, J. and Tammemae, K. 1999. Reconfigurable multiplier for Virtex FPGA family. In Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications. 359--364. Google Scholar
Digital Library
- Sriram, S., Brown, K., Defosseux, R., Moerman, F., Paviot, O., Sundararajan, V., and Gatherer, A. 2005. A 64 channel programmable receiver chip for 3G wireless infrastructure. In Proceedings of the IEEE Custom Integrated Circuits Conference. 59--62.Google Scholar
- Stelling, P. F. and Oklobdzija, V. J. 1996. Design strategies for optimal hybrid final adders in a parallel multiplier. J. VLSI Signal Process. 14, 321--331. Google Scholar
Digital Library
- Stenzel, W. J., Kubitz, W. J., and Garcia, G. H. 1977. A compact high-speed parallel multiplication scheme. IEEE Trans. Comput. C-26, 948--957. Google Scholar
Digital Library
- Verma, A. K., Brisk, P., and Ienne, P. 2008. Data-Flow transformations to maximize the use of carry-save representation in arithmetic circuits. IEEE Trans. Comput.-Aided Des. 27, 1761--1774. Google Scholar
Digital Library
- Wallace, C. S. 1964. A suggestion for a fast multiplier. IEEE Trans. Elec. Comput. 13, 14--17.Google Scholar
Cross Ref
- Xilinx Corporation. 2008a. Virtex-5 FPGA XtremeDSP design considerations. http://www.xilinx.com/.Google Scholar
- Xilinx Corporation. 2008b. Virtex-5 user guide. http://www.xilinx.com/.Google Scholar
- Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., and Troxel, B. 2002. A hybrid ASIC and FPGA architecture. In Proceedings of the International Conference on Computer-Aided Design. 187--194. Google Scholar
Digital Library
Index Terms
Field Programmable Compressor Trees: Acceleration of Multi-Input Addition on FPGAs
Recommendations
Compressor tree synthesis on commercial high-performance FPGAs
Compressor trees are a class of circuits that generalizes multioperand addition and the partial product reduction trees of parallel multipliers using carry-save arithmetic. Compressor trees naturally occur in many DSP applications, such as FIR filters, ...
Design space exploration for field programmable compressor trees
CASES '08: Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systemsThe Field Programmable Compressor Tree (FPCT) is a programmable compressor tree (e.g., a Wallace or Dadda Tree) intended for integration in an FPGA or other reconfigurable device. This paper presents a design space exploration (DSE) method that can be ...
Enhancing FPGA performance for arithmetic circuits
DAC '07: Proceedings of the 44th annual Design Automation ConferenceFPGAs offer flexibility and cost-effectiveness that ASICs cannot match; however, their performance is quite poor in comparison, especially for arithmetic dominated circuits. To address this issue, this paper introduces a novel reconfigurable lattice ...






Comments