Abstract
To improve FPGA performance for arithmetic circuits that are dominated by multi-input addition operations, an FPGA logic block is proposed that can be configured as a 6:2 or 7:2 compressor. Compressors have been used successfully in the past to realize parallel multipliers in VLSI technology; however, the peculiar structure of FPGA logic blocks, coupled with the high cost of the routing network relative to ASIC technology, renders compressors ineffective when mapped onto the general logic of an FPGA. On the other hand, current FPGA logic cells have already been enhanced with carry chains to improve arithmetic functionality, for example, to realize fast ternary carry-propagate addition. The contribution of this article is a new FPGA logic cell that is specialized to help realize efficient compressor trees on FPGAs. The new FPGA logic cell has two variants that can respectively be configured as a 6:2 or a 7:2 compressor using additional carry chains that, coupled with lookup tables, provide the necessary functionality. Experiments show that the use of these modified logic cells significantly reduces the delay of compressor trees synthesized on FPGAs compared to state-of-the-art synthesis techniques, with a moderate increase in area and power consumption.
- Betz, V. and Rose, J. 1997. VPR: A new packing, placement, and routing tool for FPGA research. In Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications. 213--222. Google Scholar
Digital Library
- Betz, V., Rose, J., and Marquardt, A. 1999. Architecture and CAD for Deep Submicron FPGAs. Kluwer Academic Publishers, Norwell, MA. Google Scholar
Digital Library
- Brisk, P., Verma, A. K., Ienne, P., and Parandeh-Afshar, H. 2007. Enhancing FPGA performance for arithmetic circuits. In Proceedings of the 44th Design Automation Conference. 404--409. Google Scholar
Digital Library
- Cevrero, A., Athanasopoulos, P., Parandeh-Afshar, H., Verma, A. K., Brisk, P., Gurkaynak, F. K., Leblebici, Y., and Ienne, P. 2008. Architectural improvements for field programmable counter arrays: Enabling efficient synthesis of fast compressor trees on FPGAs. In Proceedings of the 16th International Symposium on Field Programmable Gate Arrays. 181--190. Google Scholar
Digital Library
- Chen, C.-Y., Chien, S.-Y., Huang, Y.-W., Chen, T.-C., Wang, T.-C., and Chen, L.-G. 2006. Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circ. Syst. 53, 578--593.Google Scholar
Cross Ref
- Cherepacha, D. and Lewis, D. 1996. DP-FPGA: an FPGA architecture optimizated for datapaths. VLSI Des. 4, 329--343.Google Scholar
Cross Ref
- Choy, N. C. K. and Wilton, S. J. E. 2006. Activity-based power estimation and characterization of DSP and multiplier blocks in FPGAs. In Proceedings of the IEEE International Conference on Field Programmable Technology. 253--256.Google Scholar
- Cong, J. and Huang, H. 2005. Technology mapping and architecture evaluation for k/m-macrocell-based FPGAs. ACM Trans. Des. Automat. Electron. Syst. 10, 3--23. Google Scholar
Digital Library
- Dadda, L. 1965. Some schemes for parallel multipliers. Alta Frequenza 34, 349--356.Google Scholar
- DeHon, A. 1999. Balancing interconnect and computation in a reconfigurable computing array (or, why you don’t really want 100% LUT utilization). In Proceedings of the International Symposium on Field Programmable Gate Arrays. 69--76. Google Scholar
Digital Library
- Fadavi-Ardekani, J. 1993. M × N Booth encoded multiplier generator using optimized Wallace trees. IEEE Trans. VLSI Syst. 1, 120--125.Google Scholar
Digital Library
- Frederick, M. T. and Somani, A. K. 2006. Multi-bit carry chains for high performance reconfigurable fabrics. In Proceedings of the 16th International Conference on Field Programmable Logic and Applications. 1--6.Google Scholar
- Grover, R. S., Shang, W., and Li, Q. 2002. A faster distributed arithmetic architecture for FPGAs. In Proceedings of the 10th International Symposium on FPGAs. 31--39. Google Scholar
Digital Library
- Hauck, S., Hosler, M. M., and Fry, T. W. 2000. High-performance carry chains for FPGAs. IEEE Trans. VLSI Syst. 8, 138--147. Google Scholar
Digital Library
- Hu, Y., Das, S., Trimberger, S., and He, L. 2007. Design, synthesis, and evaluation of heterogeneous FPGA with mixed LUTs and macro-gates. In Proceedings of the International Conference on Computer-Aided Design. 188--193. Google Scholar
Digital Library
- Jamieson, P. and Rose, J. 2006. Enhancing the area of FPGAs with hard circuits using shadow clusters. In Proceedings of the IEEE International Conference on Field-Programmable Technology. 1--8.Google Scholar
- Kastner, R., Kaplan, A., Ogrenci-Memik, S., and Bozorgzadeh, E. 2002. Instruction generation for hybrid reconfigurable systems. ACM Trans. Des. Automat. Electro. Syst. 7, 605--627. Google Scholar
Digital Library
- Kaviani, A., Vranisec, D., and Brown, S. 1998. Computational field programmable architecture. In Proceedings of the IEEE Custom Integrated Circuits Conference. 261--264.Google Scholar
- Kuon, I. and Rose, J. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aid. Des. 26, 203--215. Google Scholar
Digital Library
- Kwon, O., Nowka, K., and Swartzlander Jr., E. E. 2002. A 16-bit by 16-bit MAC design using fast 5:3 compressor cells. J. VLSI Sign. Process. 31, 77--89. Google Scholar
Digital Library
- Lamoureux, J. and Wilton, S. J. E. 2006. Activity estimation for field programmable gate arrays. In Proceedings of the 16th International Conference on Field Programmable Logic and Applications. 1--8.Google Scholar
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th International Symposium on Microarchitecture. 330--335. Google Scholar
Digital Library
- Leijten-Nowak, K. and Van Meerbergen, J. L. 2003. An FPGA architecture with enhanced datapath functionality. In Proceedings of the 11th International Symposium on FPGAs. 195--204. Google Scholar
Digital Library
- Mirzaei, S., Hosangadi, A., and Kastner, R. 2006. FPGA implementation of high speed FIR filters using add and shift method. In Proceedings of the International Conference on Computer Design. 308--313.Google Scholar
- Mora Mora, H., Pascual Mora, J., Sanchez Romero, J. L., and Pujol Lopez, F. 2006. Partial product reduction based on look-up tables. In Proceedings of the International Conference on VLSI Design. 399--404. Google Scholar
Digital Library
- Najm, F. N. 1994. A survey of power estimation techniques in VLSI circuits. IEEE Trans. VLSI Syst. 2, 446--455. Google Scholar
Digital Library
- Oklobdzija, V. G. and Villeger, D. 1995. Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology. IEEE Trans. VLSI Syst. 3, 292--301. Google Scholar
Digital Library
- Parandeh-Afshar, H. Brisk, P., and Ienne, P. 2008a. A novel FPGA logic block for improved arithmetic performance. In Proceedings of the 16th International Symposium on Field Programmable Gate Arrays. 171--180. Google Scholar
Digital Library
- Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2008b. Efficient synthesis of compressor trees on FPGAs. In Proceedings of the Asia-South Pacific Design Automation Conference. 138--143. Google Scholar
Digital Library
- Parandeh-Afhsar, H., Brisk, P., and Ienne, P. 2008c. Improving synthesis of compressor trees on FPGAs via integer linear programming. In Proceedings of the International Conference on Design Automation and Test in Europe. 1256--1262. Google Scholar
Digital Library
- Parandeh-Afshar, H. Brisk, P., and Ienne, P. 2009. Exploiting fast carry chains of FPGAs for designing compressor trees. In Proceedings of the 19th International Conference on Field Programmable Logic and Applications. 242--249.Google Scholar
- Parhami, B. 2000. Computer Arithmetic, Algorithms and Hardware Designs. Oxford University Press. Google Scholar
Digital Library
- Poldre, J. and Tammemae, K. 1999. Reconfigurable multiplier for Virtex FPGA family. In Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications. 359--364. Google Scholar
Digital Library
- Poon, K. K. W., Wilton, S. J. E., and Yan, A. 2005. A detailed power model for field-programmable gate arrays. ACM Trans. Des. Automat. Electro. Syst. 10, 279--302. Google Scholar
Digital Library
- Santoro, M. and Horowitz, M. 1988. A pipelined 64x64b iterative array multiplier. In Proceedings of the IEEE Solid State Circuits Conference. 36--37, 290.Google Scholar
- Song, P. J. and De Micheli, G. 1991. Circuit and architecture tradeoffs for high-speed multiplication. IEEE J. Solid-State Circ. 26, 1184--1198.Google Scholar
Cross Ref
- Stelling, P. F., Martel, C. U., Oklobdzija, V. J., and Ravi, R. 1998. Optimal circuits for parallel multipliers. IEEE Trans. Comput. 47, 273--285. Google Scholar
Digital Library
- Stelling, P. F. and Oklobdzija, V. J. 1996. Design strategies for optimal hybrid final adders in a parallel multiplier. J. VLSI Signal Process. 14, 321--331. Google Scholar
Digital Library
- Stenzel, W. J., Kubitz, W. J., and Garcia, G. H. 1977. A compact high-speed parallel multiplication scheme. IEEE Trans. Comput. C-26, 948--957. Google Scholar
Digital Library
- Swartzlander Jr., E. E. 1973. Parallel counters. IEEE Trans. Comput. C-22, 1021--1024. Google Scholar
Digital Library
- Um, J. and Kim, T. 2002. Layout-aware synthesis of arithmetic circuits. In Proceedings of the 39th Design Automation Conference. 207--212. Google Scholar
Digital Library
- Verma, A. K., Brisk, P., and Ienne, P. 2008. Data-flow transformations to maximise the use of carry-save representation in arithmetic circuits. IEEE Trans. Comput.-Aid. Des. 27, 1761--1774. Google Scholar
Digital Library
- Verma, A. K. and Ienne, P. 2007a. Automatic synthesis of compressor trees: Reevaluating large counters. In Proceedings of the International Conference on Design Automation and Test in Europe. 443--448. Google Scholar
Digital Library
- Verma, A. K. and Ienne, P. 2007b. Improving XOR-dominated circuits by exploiting dependencies between operands. In Proceedings of the Asia-South Pacific Design Automation Conference. 601--608. Google Scholar
Digital Library
- Wallace, C. S. 1964. A suggestion for a fast multiplier. IEEE Trans. Elec. Comput. 13, 14--17.Google Scholar
Cross Ref
- Weinberger, A. 1981. A 4:2 carry save adder module. IBM Techn. Disclos. Bull. 23.Google Scholar
- Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., and Troxel, B. 2002. A hybrid ASIC and FPGA architecture. In Proceedings of the International Conference on Computer-Aided Design. 187--194. Google Scholar
Digital Library
Index Terms
An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor
Recommendations
Compressor tree synthesis on commercial high-performance FPGAs
Compressor trees are a class of circuits that generalizes multioperand addition and the partial product reduction trees of parallel multipliers using carry-save arithmetic. Compressor trees naturally occur in many DSP applications, such as FIR filters, ...
A novel FPGA logic block for improved arithmetic performance
FPGA '08: Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arraysTo improve FPGA performance for arithmetic circuits, this paper proposes a new architecture for FPGA logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adjacent compressors and can be routed ...
Field Programmable Compressor Trees: Acceleration of Multi-Input Addition on FPGAs
Multi-input addition occurs in a variety of arithmetically intensive signal processing applications. The DSP blocks embedded in high-performance FPGAs perform fixed bitwidth parallel multiplication and Multiply-ACcumulate (MAC) operations. In theory, ...








Comments