Abstract
Compressor trees are a class of circuits that generalizes multioperand addition and the partial product reduction trees of parallel multipliers using carry-save arithmetic. Compressor trees naturally occur in many DSP applications, such as FIR filters, and, in the more general case, their use can be maximized through the application of high-level transformations to arithmetically intensive data flow graphs. Due to the presence of carry-chains, it has long been thought that trees of 2- or 3-input carry-propagate adders are more efficient than compressor trees for FPGA synthesis; however, this is not the case. This article presents a heuristic for FPGA synthesis of compressor trees that outperforms adder trees and exploits carry-chains when possible. The experimental results show that, on average, the use of compressor trees can reduce critical path delay by 33% and 45% respectively, compared to adder trees synthesized on the Xilinx Virtex-5 and Altera Stratix III FPGAs.
- Altera Corporation. Stratix II, III, and IV device handbooks. http://www.altera.com/.Google Scholar
- Dadda, L. 1965. Some schemes for parallel multipliers. Alta Frequenza 34, 349--356.Google Scholar
- Hutton, M., Schleicher, J., Lewis, D. M. Pederson, Yuan, S., Kaptanoglu, S., Baeckler, G., Ratchev, B., Padalia, K., Bourgeault, M., Lee, A., Kim, H., and Saini, R. 2004. Improving FPGA performance and area using an adaptive logic module. In Proceedings of the 14th International Conference on Field Programmable Logic and Applications. 135--144.Google Scholar
- Kamp, W., Bainbridge-Smith, A., and Hayes, M. 2009. Efficient implementation of fast redundant number addrs for long word-lengths in FPGAs. In Proceedings of the International Conference on Field-Programmable Technology. 239--246.Google Scholar
- Matsunaga, T., Kimura, S., and Matsunaga, Y. 2010. Multi-Operand adder synthesis on FPGAs using generalized parallel counters. In Proceedings of the 15th Asia and South Pacific Design Automation Conference. 337--342. Google Scholar
Digital Library
- Oklobdzija, V. G. and Villeger, D. 1995. Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology. IEEE Trans. VLSI Syst. 3. 292--301. Google Scholar
Digital Library
- Ortiz, M., Quiles, F., Jormigo, J., Jaime, F. J., Villalba, J., and Zapata, E. L. 2009. Efficient implementation of carry-save adders in FPGAs. In Proceedings of the 20th IEEE International Conference on Application-specific Systems, Architectures, and Processors. 207--210. Google Scholar
Digital Library
- Paidimarri, A., Cevrero, A., Brisk, P., and Ienne, P. 2009. FPGA implementation of a single-precision floating-point multiply accumulator with single-cycle accumulation. In Proceedings of the 17th IEEE Symposium on Field Programmable Custom Computing Machines. 267--270. Google Scholar
Digital Library
- Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2008a. Efficient synthesis of compressor trees on FPGAs. In Proceedings of the Asia-South Pacific Design Automation Conference. 138--143. Google Scholar
Digital Library
- Parandeh-Afhsar, H., Brisk, P., and Ienne, P. 2008b. Improving synthesis of compressor trees on FPGAs via integer linear programming. In Proceedings of the International Conference on Design Automation and Test in Europe. 1256--1262. Google Scholar
Digital Library
- Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2009. Exploiting fast carry-chains of FPGAs for designing compressor trees. In Proceedings of the 19th International Conference on Field Programmable Logic and Applications. 242--249.Google Scholar
- Poldre, J. and Tammemae, K. 1999. Reconfigurable multiplier for Virtex FPGA family. In Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications. 359--364. Google Scholar
Digital Library
- Stelling, P. F., Martel, C. U., Oklobdzija, V. J., and Ravi, R. 1998. Optimal circuits for parallel multipliers. IEEE Trans. Comput. 47, 273--285. Google Scholar
Digital Library
- Stelling, P. F. and Oklobdzija, V. J. 1996. Design strategies for optimal hybrid final adders in a parallel multiplier. J. VLSI Signal Process. 14, 321--331. Google Scholar
Digital Library
- Stenzel, W. J., Kubitz, W. J., and Garcia, G. H. 1977. A compact high-speed parallel multiplication scheme. IEEE Trans. Comput. C-26, 948--957. Google Scholar
Digital Library
- Swartzlander Jr., E. E. 1973. Parallel counters. IEEE Trans. Comput. C-22, 1021--1024. Google Scholar
Digital Library
- Um, J. and Kim, T. 2002. Layout-aware synthesis of arithmetic circuits. In Proceedings of the 39th Design Automation Conference. 207--212. Google Scholar
Digital Library
- Verma, A. K., Brisk, P., and Ienne, P. 2008. Data-flow transformations to maximise the use of carry-save representation in arithmetic circuits. IEEE Trans. Comput.-Aided Des. 27, 1761--1774. Google Scholar
Digital Library
- Wallace, C. S. 1964. A suggestion for a fast multiplier. IEEE Trans. Elec. Comput. 13, 14--17.Google Scholar
Cross Ref
- Weinberger, A. 1981. A 4:2 carry save adder module. IBM Tech. Disclos. Bull. 23.Google Scholar
- Xilinx Corporation. Virtex 4, 5, and 6 device handbooks. http://www.xilinx.com.Google Scholar
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th International Symposium on Microarchitecture. 330--335. Google Scholar
Digital Library
- Mirzaei, S., Hosangadi, A., and Kastner, R. 2006. FPGA implementation of high speed FIR filters using add and shift method. In Proceedings of the International Conference on Computer Design. 308--313.Google Scholar
- Chen, C.-Y., Chien, S.-Y., Huang, Y.-W., Chen, T.-C., Wang, T.-C., and Chen, L.-G. 2006. Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circ. Syst. 53, 578--593.Google Scholar
Cross Ref
- Shams, A., Pan, W., Chandanandan, A., and Bayoumi, M. 2000. A high-performance 1D-DCT architecture. In Proceedings of IEEE International Symposium on Circuits and Systems. 521--524.Google Scholar
- Synopsys. 2001. Creating high-speed data-path components—Application note. Tech. rep. Mountain View, CA, version 2001.08.Google Scholar
Index Terms
Compressor tree synthesis on commercial high-performance FPGAs
Recommendations
Efficient synthesis of compressor trees on FPGAs
ASP-DAC '08: Proceedings of the 2008 Asia and South Pacific Design Automation ConferenceFPGA performance is currently lacking for arithmetic circuits. Large sums of k > 2 integer values is a computationally intensive operation in applications such as digital signal and video processing. In ASIC design, compressor trees, such as Wallace and ...
An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor
To improve FPGA performance for arithmetic circuits that are dominated by multi-input addition operations, an FPGA logic block is proposed that can be configured as a 6:2 or 7:2 compressor. Compressors have been used successfully in the past to realize ...
Field Programmable Compressor Trees: Acceleration of Multi-Input Addition on FPGAs
Multi-input addition occurs in a variety of arithmetically intensive signal processing applications. The DSP blocks embedded in high-performance FPGAs perform fixed bitwidth parallel multiplication and Multiply-ACcumulate (MAC) operations. In theory, ...






Comments