skip to main content
research-article

An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor

Published:01 September 2009Publication History
Skip Abstract Section

Abstract

To improve FPGA performance for arithmetic circuits that are dominated by multi-input addition operations, an FPGA logic block is proposed that can be configured as a 6:2 or 7:2 compressor. Compressors have been used successfully in the past to realize parallel multipliers in VLSI technology; however, the peculiar structure of FPGA logic blocks, coupled with the high cost of the routing network relative to ASIC technology, renders compressors ineffective when mapped onto the general logic of an FPGA. On the other hand, current FPGA logic cells have already been enhanced with carry chains to improve arithmetic functionality, for example, to realize fast ternary carry-propagate addition. The contribution of this article is a new FPGA logic cell that is specialized to help realize efficient compressor trees on FPGAs. The new FPGA logic cell has two variants that can respectively be configured as a 6:2 or a 7:2 compressor using additional carry chains that, coupled with lookup tables, provide the necessary functionality. Experiments show that the use of these modified logic cells significantly reduces the delay of compressor trees synthesized on FPGAs compared to state-of-the-art synthesis techniques, with a moderate increase in area and power consumption.

References

  1. Betz, V. and Rose, J. 1997. VPR: A new packing, placement, and routing tool for FPGA research. In Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications. 213--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Betz, V., Rose, J., and Marquardt, A. 1999. Architecture and CAD for Deep Submicron FPGAs. Kluwer Academic Publishers, Norwell, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brisk, P., Verma, A. K., Ienne, P., and Parandeh-Afshar, H. 2007. Enhancing FPGA performance for arithmetic circuits. In Proceedings of the 44th Design Automation Conference. 404--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cevrero, A., Athanasopoulos, P., Parandeh-Afshar, H., Verma, A. K., Brisk, P., Gurkaynak, F. K., Leblebici, Y., and Ienne, P. 2008. Architectural improvements for field programmable counter arrays: Enabling efficient synthesis of fast compressor trees on FPGAs. In Proceedings of the 16th International Symposium on Field Programmable Gate Arrays. 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chen, C.-Y., Chien, S.-Y., Huang, Y.-W., Chen, T.-C., Wang, T.-C., and Chen, L.-G. 2006. Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circ. Syst. 53, 578--593.Google ScholarGoogle ScholarCross RefCross Ref
  6. Cherepacha, D. and Lewis, D. 1996. DP-FPGA: an FPGA architecture optimizated for datapaths. VLSI Des. 4, 329--343.Google ScholarGoogle ScholarCross RefCross Ref
  7. Choy, N. C. K. and Wilton, S. J. E. 2006. Activity-based power estimation and characterization of DSP and multiplier blocks in FPGAs. In Proceedings of the IEEE International Conference on Field Programmable Technology. 253--256.Google ScholarGoogle Scholar
  8. Cong, J. and Huang, H. 2005. Technology mapping and architecture evaluation for k/m-macrocell-based FPGAs. ACM Trans. Des. Automat. Electron. Syst. 10, 3--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dadda, L. 1965. Some schemes for parallel multipliers. Alta Frequenza 34, 349--356.Google ScholarGoogle Scholar
  10. DeHon, A. 1999. Balancing interconnect and computation in a reconfigurable computing array (or, why you don’t really want 100% LUT utilization). In Proceedings of the International Symposium on Field Programmable Gate Arrays. 69--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fadavi-Ardekani, J. 1993. M × N Booth encoded multiplier generator using optimized Wallace trees. IEEE Trans. VLSI Syst. 1, 120--125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Frederick, M. T. and Somani, A. K. 2006. Multi-bit carry chains for high performance reconfigurable fabrics. In Proceedings of the 16th International Conference on Field Programmable Logic and Applications. 1--6.Google ScholarGoogle Scholar
  13. Grover, R. S., Shang, W., and Li, Q. 2002. A faster distributed arithmetic architecture for FPGAs. In Proceedings of the 10th International Symposium on FPGAs. 31--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hauck, S., Hosler, M. M., and Fry, T. W. 2000. High-performance carry chains for FPGAs. IEEE Trans. VLSI Syst. 8, 138--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hu, Y., Das, S., Trimberger, S., and He, L. 2007. Design, synthesis, and evaluation of heterogeneous FPGA with mixed LUTs and macro-gates. In Proceedings of the International Conference on Computer-Aided Design. 188--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jamieson, P. and Rose, J. 2006. Enhancing the area of FPGAs with hard circuits using shadow clusters. In Proceedings of the IEEE International Conference on Field-Programmable Technology. 1--8.Google ScholarGoogle Scholar
  17. Kastner, R., Kaplan, A., Ogrenci-Memik, S., and Bozorgzadeh, E. 2002. Instruction generation for hybrid reconfigurable systems. ACM Trans. Des. Automat. Electro. Syst. 7, 605--627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kaviani, A., Vranisec, D., and Brown, S. 1998. Computational field programmable architecture. In Proceedings of the IEEE Custom Integrated Circuits Conference. 261--264.Google ScholarGoogle Scholar
  19. Kuon, I. and Rose, J. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aid. Des. 26, 203--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kwon, O., Nowka, K., and Swartzlander Jr., E. E. 2002. A 16-bit by 16-bit MAC design using fast 5:3 compressor cells. J. VLSI Sign. Process. 31, 77--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lamoureux, J. and Wilton, S. J. E. 2006. Activity estimation for field programmable gate arrays. In Proceedings of the 16th International Conference on Field Programmable Logic and Applications. 1--8.Google ScholarGoogle Scholar
  22. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th International Symposium on Microarchitecture. 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Leijten-Nowak, K. and Van Meerbergen, J. L. 2003. An FPGA architecture with enhanced datapath functionality. In Proceedings of the 11th International Symposium on FPGAs. 195--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mirzaei, S., Hosangadi, A., and Kastner, R. 2006. FPGA implementation of high speed FIR filters using add and shift method. In Proceedings of the International Conference on Computer Design. 308--313.Google ScholarGoogle Scholar
  25. Mora Mora, H., Pascual Mora, J., Sanchez Romero, J. L., and Pujol Lopez, F. 2006. Partial product reduction based on look-up tables. In Proceedings of the International Conference on VLSI Design. 399--404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Najm, F. N. 1994. A survey of power estimation techniques in VLSI circuits. IEEE Trans. VLSI Syst. 2, 446--455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Oklobdzija, V. G. and Villeger, D. 1995. Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology. IEEE Trans. VLSI Syst. 3, 292--301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Parandeh-Afshar, H. Brisk, P., and Ienne, P. 2008a. A novel FPGA logic block for improved arithmetic performance. In Proceedings of the 16th International Symposium on Field Programmable Gate Arrays. 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2008b. Efficient synthesis of compressor trees on FPGAs. In Proceedings of the Asia-South Pacific Design Automation Conference. 138--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Parandeh-Afhsar, H., Brisk, P., and Ienne, P. 2008c. Improving synthesis of compressor trees on FPGAs via integer linear programming. In Proceedings of the International Conference on Design Automation and Test in Europe. 1256--1262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Parandeh-Afshar, H. Brisk, P., and Ienne, P. 2009. Exploiting fast carry chains of FPGAs for designing compressor trees. In Proceedings of the 19th International Conference on Field Programmable Logic and Applications. 242--249.Google ScholarGoogle Scholar
  32. Parhami, B. 2000. Computer Arithmetic, Algorithms and Hardware Designs. Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Poldre, J. and Tammemae, K. 1999. Reconfigurable multiplier for Virtex FPGA family. In Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications. 359--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Poon, K. K. W., Wilton, S. J. E., and Yan, A. 2005. A detailed power model for field-programmable gate arrays. ACM Trans. Des. Automat. Electro. Syst. 10, 279--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Santoro, M. and Horowitz, M. 1988. A pipelined 64x64b iterative array multiplier. In Proceedings of the IEEE Solid State Circuits Conference. 36--37, 290.Google ScholarGoogle Scholar
  36. Song, P. J. and De Micheli, G. 1991. Circuit and architecture tradeoffs for high-speed multiplication. IEEE J. Solid-State Circ. 26, 1184--1198.Google ScholarGoogle ScholarCross RefCross Ref
  37. Stelling, P. F., Martel, C. U., Oklobdzija, V. J., and Ravi, R. 1998. Optimal circuits for parallel multipliers. IEEE Trans. Comput. 47, 273--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Stelling, P. F. and Oklobdzija, V. J. 1996. Design strategies for optimal hybrid final adders in a parallel multiplier. J. VLSI Signal Process. 14, 321--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Stenzel, W. J., Kubitz, W. J., and Garcia, G. H. 1977. A compact high-speed parallel multiplication scheme. IEEE Trans. Comput. C-26, 948--957. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Swartzlander Jr., E. E. 1973. Parallel counters. IEEE Trans. Comput. C-22, 1021--1024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Um, J. and Kim, T. 2002. Layout-aware synthesis of arithmetic circuits. In Proceedings of the 39th Design Automation Conference. 207--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Verma, A. K., Brisk, P., and Ienne, P. 2008. Data-flow transformations to maximise the use of carry-save representation in arithmetic circuits. IEEE Trans. Comput.-Aid. Des. 27, 1761--1774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Verma, A. K. and Ienne, P. 2007a. Automatic synthesis of compressor trees: Reevaluating large counters. In Proceedings of the International Conference on Design Automation and Test in Europe. 443--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Verma, A. K. and Ienne, P. 2007b. Improving XOR-dominated circuits by exploiting dependencies between operands. In Proceedings of the Asia-South Pacific Design Automation Conference. 601--608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Wallace, C. S. 1964. A suggestion for a fast multiplier. IEEE Trans. Elec. Comput. 13, 14--17.Google ScholarGoogle ScholarCross RefCross Ref
  46. Weinberger, A. 1981. A 4:2 carry save adder module. IBM Techn. Disclos. Bull. 23.Google ScholarGoogle Scholar
  47. Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., and Troxel, B. 2002. A hybrid ASIC and FPGA architecture. In Proceedings of the International Conference on Computer-Aided Design. 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor

        Recommendations

        Reviews

        Srinivasa R Vemuru

        Field-programmable gate arrays (FPGAs) are ideal platforms for prototyping hardware, due to the fast turnaround time and inherent reconfigurable nature of the devices. FPGAs are increasingly being used in low-to-medium-volume markets. Although their performance is superior to software implementations, FPGA implementations still have significantly lower speeds than application-specific integrated circuit implementations. To reduce this performance gap, FPGAs are equipped with additional hardware and programmable features to improve the performance of arithmetic blocks. Compressor trees are very suitable for multiple digit addition and fast multiplication applications. The authors present new enhancements to commercial FPGA architectures that make it easier to implement 6:2 and 7:2 compressors. The paper has a good introduction to arithmetic primitives and their mapping onto the FPGA hardware resources. The authors describe the cell modifications to improve the performance of arithmetic primitives. They discuss in detail the heuristics to map compressor trees to the modified FPGA cells. They study the critical path delay, power consumption, and four different implementations of multiple benchmark arithmetic circuits, on the modified FPGA architecture. The four implementations are ternary, generalized parallel counters (GPC), GPC with 6:2 compressors, and GPC with 7:2 compressors. Overall, the implementations based on compressors have significantly reduced delays, with an increase in the use of FPGA resources and power consumption. The paper should be of interest to researchers in the areas of FPGA architectures and computer arithmetic. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 2, Issue 3
          September 2009
          121 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/1575774
          Issue’s Table of Contents

          Copyright © 2009 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 September 2009
          • Accepted: 1 June 2009
          • Revised: 1 February 2009
          • Received: 1 August 2008
          Published in trets Volume 2, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!