REFRESH FPGAs: Sustainable FPGA Chiplet Architectures

There is a growing call for greater amounts of increasingly agile computational power for edge and cloud infrastructure to serve the computationally complex needs of ubiquitous computing devices. Thus, an important challenge is addressing the holistic environmental impacts of these next-generation computing systems. To accomplish this, a life-cycle view of sustainability for computing advancements is necessary to reduce environmental impacts such as greenhouse warming gas emissions from these computing choices. Unfortunately, decadal efforts to address operational energy efficiency in computing devices have ignored and in some cases exacerbated embodied impacts from manufacturing these edge and cloud systems, particularly their integrated circuits. During this time FPGA architectures have not changed dramatically except to increase in size. Given this context, we propose REFRESH FPGAs to build new FPGA devices and architectures from recently retired FPGA dies using 2.5D integration. To build REFRESH FPGAs requires creative architectures that leverage existing chiplet pins with an inexpensive to-manufacture interposer coupled with creative design automation. In this paper, we discuss how REFRESH FPGAs can leverage industry trends for renewable energy integration into data centers while providing an overall improvement for sustainability and amortizing their significant embodied cost investment over a much longer ``first'' lifetime.


I. INTRODUCTION
As we have become firmly ensconced in the post Moore era, computer architectures have turned to accelerators for executing computationally and/or memory intensive applications with improved performance.However, a new emerging concern is the environmental impacts of decisions made about these next generation architectures.Until recently, sustainable computing was highly concerned with operational energy efficiency to reduce greenhouse warming gas (GHG) emissions, such as CO 2 , from electricity generated with fossil fuels to power these systems.However, continuing advances in renewable energy integration, coupled with the realization of the significant and in many cases, dominant, embodied GHG emissions from chip manufacturing for these systems, has changed the calculus of sustainable computing [1]- [5].
Fig. 1 shows examples of lifecycle assessments of a variety of computing devices.Mobile devices tend to exceed 75% of their carbon as embodied carbon.However, even desktop and server machine examples show at least 50% of their carbon from embodied carbon.For data center systems, the embodied energy was about 33% of the overall energy compared to 65% from operational energy.However, with renewable integration, the embodied carbon is 82% compared to 18% operational carbon emissions in leading hyper scalars [1], [6].Thus, there is a growing movement to address environmental impacts, holistically, for computing systems throughout all stages of their life-cycle, including manufacturing, supply chain, operation, and disposal.Focusing solely on operational energyefficiency without taking the embodied environmental impacts of computing systems into consideration cannot achieve true sustainability in the long run.
Towards this goal, we propose REFRESH or Revisiting Expanding FPGA Real-estate for Environmentally Sustainable Heterogeneous-Systems.The REFRESH concept is based on several FPGA-specific observations.The replacement cycle for systems with accelerators (particularly FPGAs) is very fast (circa two years) [7] due to increasingly short support lifetimes by the vendors.However, these retired devices have many years of effective service life remaining.For sustainability, amortizing their embodied environmental impact investment over a longer service life is desirable.Moreover, the regularity, maturity, and flexibility of these devices suggest they have the most potential for obtaining value from increasingly long lifetimes.Furthermore, REFRESH reduces pressure to extract raw materials such as rare earth minerals, while also significantly reducing the growing environmental risks of e-waste by keeping these toxic, non-biodegradable devices out of landfills and reducing their negative impacts accumulating in the soil, air, water minimizing health impacts to living things.
REFRESH proposes to build "new" FPGA devices from recently retired FPGAs using 2.5D integration of these FPGA dies with an underlying interposer.This allows for an interconnection between FPGA chiplets as well as thru silicon vias (TSV) to the underlying package pins with an example in Fig. 2.This allows FPGA devices to achieve a much longer "first" lifetime while meeting the needs of accelerator programmers to provide increasingly large and capable configurable fabrics.Leveraging the increasing investment of renewable energy, moderate increases in operational energy may have a minimal negative environmental impact while achieving substantial improvements in embodied environmental impacts.

II. REFRESH CONCEPT AND SUSTAINABILITY
Modern FPGAs are already transitioning to chiplet-based design to increase yields for such large devices.Building REFRESH FPGAs requires advances to FPGA architecture and design flows that are consistent with the challenges of designing for chiplet-based FPGAs with new challenges of more restricted long distance interconnect as well as challenges of reliability for which FPGA architectures are well suited.Because dies for REFRESH devices have already been packaged, in REFRESH we introduce the concept of super duper long lines (SDLLs) to account for I/O that has already been routed to package pins.In Fig. 2 we show that RE-FRESH devices may integrate monolithic devices as well as chiplet-based devices.Thus, characterization of communication (SLLs and SDLLs) across boundaries through an interposer is critical to inform chiplet layout and interposer design for REFRESH FPGAs including consideration of which homogeneous and heterogeneous architectures, potentially including devices from different generations and high-bandwidth memory can be retrofitted into 2.5D System-in-Package (SiP) design.A critical tool for programming these devices will be a fine grain automated flow to partition designs across chip boundaries [8]- [12].

B. REFRESH Hardware Analysis and Conceptualization
FPGA architecture has been relatively static in terms of innovation in look-up tables, multiply accumulate units, block memories, etc., over the last decade or more.Actual FPGA advances are in the capacity of what FPGAs can support while the performance in terms of clock frequency and energy for a fixed design has not improved dramatically [13].
To demonstrate this, we implemented a 32-bit floating point matrix multiplication design on three generations of FPGA fabrics from AMD/Xilinx, shown in Table I.Each new generation benefits, however the improvement is not dramatic (50% improvement in latency from 28 nm to 7 nm).The dynamic power drops from 22 W to 13 W, but the static power grows from 0.8 W to almost 10 W. These power estimates are from the AMD/Xilinx tool flow.
To compare two design choices for their sustainability requires we combine the contribution of manufacturing (embodied) impacts (E i ) and operational impacts (O i ) of such systems into a relevant number based on the system lifetime (L i ).We do this by using indifference and break-even calculations [2] as described in Eq. 1.To demonstrate the potential value of REFRESH FPGAs we show some system-level comparisons in Figure 3. First in Figure 3a, we show the indifference point comparison (t I ) of large-scale matrix multiplication using a VM1802 FPGA compared to a first-order approximation of a REFRESH FPGA comprised of four ZCU102 devices.The VM1802 has a significantly higher embodied contribution than the REFRESH device, with the REFRESH device having a higher operational contribution with a lower performance.Thus t I is when the VM1802 saves enough operational carbon to meet the REFRESH FPGA.
We show three cases: r sleep " t25%, 50%, 25%u, r active " t25%, 50%, 75%u such that r sleep is the sleep time to total time in service and r active is the computation time versus non-sleep time, including idle time.Cases 1 and 2 do a similar amount of work but have different sleep-to-idle ratios, while case 3 does 3ˆthe work of Case 1.The VM1802 FPGA accelerator fabricated in Taiwan and used in CA shows that it has a t I of ď 1 year because the operational savings eventually makes up for the embodied overhead.However, as renewable energy penetration increases, the indifference time increases, reaching three years for cases 1 and 2, and two years for case 3.

III. CONCLUSIONS
Several critical challenges remain to be solved to build effective and reliable REFRESH devices including addressing the impact of die aging, die connection topology, connection bandwidth, architectural choices, fault tolerance, and replacement cycles, for current and future acceleration workloads.However, REFRESH FPGAs have the potential to provide better sustainability over the system lifecycle [2]- [4], [14]- [17] for applications such as hyperdimensional computing, deep learning [8], [18]- [25], and bioinformatics [26]- [28].

Fig. 1 :
Fig. 1: Sources of CO 2 e from different computing products.

Fig. 2 :
Fig. 2: REFRESH interposer for integration of homogeneous and heterogeneous monolithic and/or chiplet-based FPGAs.A. REFRESH Architecture and Design Automation Co-DesignChiplet-based FPGAs must address the limited bandwidth between chiplets, often referred to as super long lines (SLLs).Because dies for REFRESH devices have already been packaged, in REFRESH we introduce the concept of super duper long lines (SDLLs) to account for I/O that has already been routed to package pins.In Fig.2we show that RE-FRESH devices may integrate monolithic devices as well as chiplet-based devices.Thus, characterization of communication (SLLs and SDLLs) across boundaries through an interposer is critical to inform chiplet layout and interposer design for REFRESH FPGAs including consideration of which homogeneous and heterogeneous architectures, potentially including devices from different generations and high-bandwidth memory can be retrofitted into 2.5D System-in-Package (SiP) design.A critical tool for programming these devices will be a fine grain automated flow to partition designs across chip boundaries[8]-[12].B.REFRESH Hardware Analysis and ConceptualizationFPGA architecture has been relatively static in terms of innovation in look-up tables, multiply accumulate units, block memories, etc., over the last decade or more.Actual FPGA advances are in the capacity of what FPGAs can support while the performance in terms of clock frequency and energy for a fixed design has not improved dramatically[13].To demonstrate this, we implemented a 32-bit floating point matrix multiplication design on three generations of FPGA fabrics from AMD/Xilinx, shown in TableI.Each new generation benefits, however the improvement is not dramatic (50% improvement in latency from 28 nm to 7 nm).The dynamic power drops from 22 W to 13 W, but the static power grows from 0.8 W to almost 10 W. These power estimates are from the AMD/Xilinx tool flow.To compare two design choices for their sustainability requires we combine the contribution of manufacturing (embodied) impacts (E i ) and operational impacts (O i ) of such systems into a relevant number based on the system lifetime (L i ).We do this by using indifference and break-even calculations[2] as described in Eq. 1.

Fig. 3 :
Fig. 3: Carbon indifference plots for a VM1802 vs. a RE-FRESH FPGA made from four ZCU102 2.5D integrated dies.

TABLE I :
32-bit floating-point matrix multiplication implemented on different FPGA generations.