skip to main content
research-article

Energy-Efficient Floating-Point Unit Design

Published:01 July 2011Publication History
Skip Abstract Section

Abstract

Energy-efficient computation is critical if we are going to continue to scale performance in power-limited systems. For floating-point applications that have large amounts of data parallelism, one should optimize the {\rm throughput/mm}^{2} given a power density constraint. We present a method for creating a trade-off curve that can be used to estimate the maximum floating-point performance given a set of area and power constraints. Looking at FP multiply-add units and ignoring register and memory overheads, we find that in a 90 nm CMOS technology at 1 {\rm W/mm}^{2}, one can achieve a performance of {\rm 27 GFlops/mm}^{2} single precision, and {\rm 7.5 GFlops/mm}^{2} double precision. Adding register file overheads reduces the throughput by less than 50 percent if the compute intensity is high. Since the energy of the basic gates is no longer scaling rapidly, to maintain constant power density with scaling requires moving the overall FP architecture to a lower energy/performance point. A 1 {\rm W}/{\rm mm}^{2} design at 90 nm is a “high-energy” design, so scaling it to a lower energy design in 45 nm still yields a 7\times performance gain, while a more balanced 0.1 {\rm W/mm}^{2} design only speeds up by 3.5{\times} when scaled to 45 nm. Performance scaling below 45 nm rapidly decreases, with a projected improvement of only {\sim} 3{\times} for both power densities when scaling to a 22 nm technology.

Index Terms

  1. Energy-Efficient Floating-Point Unit Design

    Recommendations

    Reviews

    Srinivasa R Vemuru

    Hardware designs have been optimized historically for low latency and, in recent years, for low energy consumption. With the processor design paradigm shifting from single-core to multi-core technologies, a better target for hardware optimization would be the throughput. The authors present an optimization methodology that targets throughput under power density constraints for floating-point units. This tradeoff is established using throughput-energy density curves. The optimization procedures are explored for resource- and performance-constrained designs. The impact of register file overhead in the pipeline stages on the overall latency and power density is also considered. The main contribution of the paper is to show that the throughput can be improved at different rates at different power density levels. For constant energy density designs, the throughput will continue to scale up as the process moves to lower technology nodes, but will do so at slower rates. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access