skip to main content
research-article

UNILOGIC: A Novel Architecture for Highly Parallel Reconfigurable Systems

Authors Info & Claims
Published:09 September 2020Publication History
Skip Abstract Section

Abstract

One of the main characteristics of High-performance Computing (HPC) applications is that they become increasingly performance and power demanding, pushing HPC systems to their limits. Existing HPC systems have not yet reached exascale performance mainly due to power limitations. Extrapolating from today’s top HPC systems, about 100–200 MWatts would be required to sustain an exaflop-level of performance. A promising solution for tackling power limitations is the deployment of energy-efficient reconfigurable resources (in the form of Field-programmable Gate Arrays (FPGAs)) tightly integrated with conventional CPUs. However, current FPGA tools and programming environments are optimized for accelerating a single application or even task on a single FPGA device. In this work, we present UNILOGIC (Unified Logic), a novel HPC-tailored parallel architecture that efficiently incorporates FPGAs. UNILOGIC adopts the Partitioned Global Address Space (PGAS) model and extends it to include hardware accelerators, i.e., tasks implemented on the reconfigurable resources. The main advantages of UNILOGIC are that (i) the hardware accelerators can be accessed directly by any processor in the system, and (ii) the hardware accelerators can access any memory location in the system. In this way, the proposed architecture offers a unified environment where all the reconfigurable resources can be seamlessly used by any processor/operating system. The UNILOGIC architecture also provides hardware virtualization of the reconfigurable logic so that the hardware accelerators can be shared among multiple applications or tasks. The FPGA layer of the architecture is implemented by splitting its reconfigurable resources into (i) a static partition, which provides the PGAS-related communication infrastructure, and (ii) fixed-size and dynamically reconfigurable slots that can be programmed and accessed independently or combined together to support both fine and coarse grain reconfiguration.1 Finally, the UNILOGIC architecture has been evaluated on a custom prototype that consists of two 1U chassis, each of which includes eight interconnected daughter boards, called Quad-FPGA Daughter Boards (QFDBs); each QFDB supports four tightly coupled Xilinx Zynq Ultrascale+ MPSoCs as well as 64 Gigabytes of DDR4 memory, and thus, the prototype features a total of 64 Zynq MPSoCs and 1 Terabyte of memory. We tuned and evaluated the UNILOGIC prototype using both low-level (baremetal) performance tests, as well as two popular real-world HPC applications, one compute-intensive and one data-intensive. Our evaluation shows that UNILOGIC offers impressive performance that ranges from being 2.5 to 400 times faster and 46 to 300 times more energy efficient compared to conventional parallel systems utilizing only high-end CPUs, while it also outperforms GPUs by a factor ranging from 3 to 6 times in terms of time to solution, and from 10 to 20 times in terms of energy to solution.

References

  1. AXI 2017. AXI Reference Guide. Retrieved from www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf.Google ScholarGoogle Scholar
  2. BittWare. 2019. BittWare FPGA Acceleration. Retrieved from https://www.bittware.com/.Google ScholarGoogle Scholar
  3. M. Blott. 2016. Reconfigurable future for HPC. In Proceedings of the International Conference on High Performance Computing Simulation (HPCS’16). 130--131.Google ScholarGoogle ScholarCross RefCross Ref
  4. B. Brech, J. Rubio, and M. Hollinger. 2015. Data Engine for NoSQL-IBM Power Systems Edition. White Paper.Google ScholarGoogle Scholar
  5. A. Cilardo. 2018. HtComp: Bringing reconfigurable hardware to future high-performance applications. Int. J. High Perform. Comput. Appl. 12, 1 (2018), 74--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Convey Computer Corp. 2012. The Convey HC-2 Computer Architectural Overview (White Paper). Retrieved from https://www.micron.com/-/media/documents/products/white-paper/wp_convey_hc2_architectual_overview.pdf.Google ScholarGoogle Scholar
  7. R. S. Correa and J. P. David. 2018. Ultra-low latency communication channels for FPGA-based HPC cluster. Integration 63 (2018), 41--55.Google ScholarGoogle ScholarCross RefCross Ref
  8. F. A. Escobar, X. Chang, and C. Valderrama. 2016. Suitability analysis of FPGAs for heterogeneous platforms in HPC. IEEE Trans. Parallel Distrib. Syst. 27, 2 (2016), 600--612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Arif et al. 2020. Performance and energy-efficient implementation of a smart city application on FPGAs. J. Real-Time Image Process. 17, 3 (2020), 729--743.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. D. George et al. 2016. Novo-G#: Large-scale reconfigurable computing with direct and programmable interconnects. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’16). 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Iordache et al. 2016. High performance in the cloud with FPGA groups. In Proceedings of the 9th International Conference on Utility and Cloud Computing (UCC’16). 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Ioannou et al. 2019. Optimized FPGA implementation of a compute-intensive oil reservoir simulation algorithm. In Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer International Publishing, 442--454.Google ScholarGoogle Scholar
  13. A. Mondigo et al. 2017. Design and scalability analysis of bandwidth-compressed stream computing with multiple FPGAs. In Proceedings of the 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC’17). 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  14. A. Putnam et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Putnam et al. 2016. A reconfigurable fabric for accelerating large-scale datacenter services. Commun. ACM 59, 11 (2016), 114--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Rigo et al. 2017. Paving the way towards a highly energy-efficient and highly integrated compute node for the exascale revolution: The ExaNoDe approach. In Proceedings of the Euromicro Conference on Digital System Design (DSD’17). 486--493.Google ScholarGoogle ScholarCross RefCross Ref
  17. B. Subramaniam et al. 2013. Trends in energy-efficient computing: A perspective from the Green500. In Proceedings of the International Green Computing Conference (IGCC’13). 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  18. C. Vatsolakis et al. 2017. RACOS: Transparent access and virtualization of reconfigurable hardware accelerators. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS’17). 11--19. DOI:https://doi.org/10.1109/SAMOS.2017.8344606Google ScholarGoogle ScholarCross RefCross Ref
  19. D. C. Price et al. 2016. Optimizing performance-per-watt on GPUs in high performance computing: Temperature, frequency and voltage effects. Comput. Sci. Res. Dev. 31, 4 (2016), 185--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. V. Vu et al. 2014. Enabling partial reconfiguration for coprocessors in mixed criticality multicore systems using PCI express single-root I/O virtualization. In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig’14). 1--6. DOI:https://doi.org/10.1109/ReConFig.2014.7032516Google ScholarGoogle ScholarCross RefCross Ref
  21. F. Chaix et al. 2019. Implementation and impact of an ultra-compact multi-FPGA board for large system prototyping. In Proceedings of the 5th International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’19).Google ScholarGoogle ScholarCross RefCross Ref
  22. G. Pitsis et al. 2019. Efficient convolutional neural network weight compression for space data classification on multi-FPGA platforms. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). 3917--3921. DOI:https://doi.org/10.1109/ICASSP.2019.8682732Google ScholarGoogle ScholarCross RefCross Ref
  23. I. Kalomoiris et al. 2019. An experimental analysis of the opportunities to use field programmable gate array multiprocessors for on-board satellite deep learning classification of spectroscopic observations from future ESA space missions. In Proceedings of the Conference on On-board Data Processing (OBDP’19).Google ScholarGoogle Scholar
  24. I. Mavroidis et al. 2016. ECOSCALE: Reconfigurable computing and runtime system for future exascale systems. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’16). 696--701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Korinth et al. 2019. The TaPaSCo Open-Source Toolflow for the Automated Composition of Task-Based Parallel Reconfigurable Computing Systems. 214--229.Google ScholarGoogle Scholar
  26. J. Ouyang et al. 2014. SDA: Software-defined accelerator for large-scale DNN systems. In Proceedings of the IEEE Hot Chips 26 Symposium (HCS’14). 1--23.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Weerasinghe et al. 2016. Network-attached FPGAs for data center applications. In Proceedings of the International Conference on Field-Programmable Technology (FPT’16). 36--43. DOI:https://doi.org/10.1109/FPT.2016.7929186Google ScholarGoogle ScholarCross RefCross Ref
  28. J. Weerasinghe et al. 2016. Network-attached FPGAs for data center applications. Proceedings of the International Conference on Field-Programmable Technology (FPT’16). 36--43.Google ScholarGoogle ScholarCross RefCross Ref
  29. K. Pham et al. 2017. BITMAN: A tool and API for FPGA bitstream manipulations. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’17). IEEE, 894--897. DOI:https://doi.org/10.23919/DATE.2017.7927114 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Pham et al. 2018. IPRDF: An isolated partial reconfiguration design flow for Xilinx FPGAs. In Proceedings of the 12th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC’18). 36--43.Google ScholarGoogle ScholarCross RefCross Ref
  31. Lee Howes et al. 2015. TheOpenCL Specification. Retrieved from www.khronos.org/registry/OpenCL/specs/opencl-2.0.pdf.Google ScholarGoogle Scholar
  32. M. Huang et al. 2016. Programming and runtime support to Blaze FPGA accelerator deployment at datacenter scale. In Proceedings of the 7th ACM Symposium on Cloud Computing. 456--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Katevenis et al. 2016. The ExaNeSt project: Interconnects, storage, and packaging for exascale systems. In Proceedings of the Euromicro Conference on Digital System Design (DSD’16). 60--67.Google ScholarGoogle ScholarCross RefCross Ref
  34. M. Marazakis et al. 2016. EUROSERVER: Share-anything scale-out micro-server design. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’16). 678--683. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Makni et al. 2017. Performance exploration of AMBA AXI4 bus protocols for wireless sensor networks. In 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA'17). 1163--1169.Google ScholarGoogle ScholarCross RefCross Ref
  36. M. Vesper et al. 2016. JetStream: An open-source high-performance PCI Express 3 streaming library for FPGA-to-Host and FPGA-to-FPGA communication. In Proceedings of the 26th International Conference on Field Programmable Logic and Applications (FPL’16). 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  37. M. Yoshimi et al. 2010. A performance evaluation of CUBE: One-dimensional 512 FPGA cluster. In Proceedings of the 6th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications (ARC’10). 372--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. N. B. Grigore et al. 2018. HLS enabled partially reconfigurable module implementation. In Proceedings of the 31st International Conference on Architecture of Computing Systems (ARCS’18). 269--282.Google ScholarGoogle ScholarCross RefCross Ref
  39. O. Sander et al. 2014. A flexible interface architecture for reconfigurable coprocessors in embedded multicore systems using PCIe Single-root I/O virtualization. In Proceedings of the International Conference on Field-Programmable Technology (FPT’14). 223--226. DOI:https://doi.org/10.1109/FPT.2014.7082780Google ScholarGoogle ScholarCross RefCross Ref
  40. P. Malakonakis et al. 2018. HLS algorithmic explorations for HPC execution on reconfigurable hardware—ECOSCALE. In Proceedings of the 14th International Symposium on Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC’18). 724--736.Google ScholarGoogle ScholarCross RefCross Ref
  41. R. Ammendola et al. 2017. The next generation of exascale-class systems: The ExaNeSt project. In Proceedings of the Euromicro Conference on Digital System Design (DSD’17). 510--515.Google ScholarGoogle ScholarCross RefCross Ref
  42. R. Kobayashi et al. 2018. OpenCL-ready high speed FPGA network for reconfigurable high performance computing. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region (HPC’18). 192--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Lyberis et al. 2014. FPGA prototyping of emerging manycore architectures for parallel programming research using formic boards. J. Syst. Architect. 60 (June 2014).Google ScholarGoogle ScholarCross RefCross Ref
  44. V. Viswanathan et al. 2015. A parallel and scalable multi-FPGA based architecture for high performance applications (abstract only). In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yann Beilliard et al. 2019. FPGA-based multi-chip module for high-performance computing. CoRR abs/1906.11175. Retrieved from http://arxiv.org/abs/1906.11175.Google ScholarGoogle Scholar
  46. Y. Durand et al. 2014. EUROSERVER: Energy efficient node for european micro-servers. In Proceedings of the 17th Euromicro Conference on Digital System Design (DSD’14). 206--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Y. Liu et al. 2010. Building a multi-FPGA-based emulation framework to support networks-on-chip design and verification. Int. J. Electron. 97 (Oct. 2010), 1241--1262.Google ScholarGoogle Scholar
  48. Z. Wang et al. 2016. Melia: A MapReduce framework on OpenCL-based FPGAs. IEEE Trans. Parallel Distrib. Syst. 27, 12 (2016), 3547--3560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. EU. 2013--2017. The Euroserver Project. Retrieved from http://www.euroserver-project.eu.Google ScholarGoogle Scholar
  50. K. Fleming and M. Adler. 2016. The LEAP FPGA operating system. In FPGAs for Software Programmers. 245--258.Google ScholarGoogle Scholar
  51. Pro Design Electronic GmbH. 2019. profpga: FPGA Prototyping. Retrieved from https://www.profpga.com.Google ScholarGoogle Scholar
  52. SciEngines GmbH. 2019. SciEngines Hardware, High Performance Reconfigurable Computing. Retrieved from https://www.sciengines.com/technology-platform/sciengines-hardware/.Google ScholarGoogle Scholar
  53. Amazon.com Inc. 2019. Amazon EC2 F1 Instances. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.Google ScholarGoogle Scholar
  54. Digilent Inc. 2019. FPGA, Microcontrollers and Instrumentation. Retrieved from http://www.digilent.com.Google ScholarGoogle Scholar
  55. Maxeler Technologies Inc. 2019. Dataflow Computing. Retrieved from https://www.maxeler.com/technology/dataflow-computing/.Google ScholarGoogle Scholar
  56. Maxeler Technologies Inc. 2019. Maxeler Products. Retrieved from https://www.maxeler.com/products/.Google ScholarGoogle Scholar
  57. National Instruments. 2019. Automated Test and Automated Measurement Systems. Retrieved from http://www.ni.com/en-us/innovations/wireless/software-defined-radio.html.Google ScholarGoogle Scholar
  58. N. Kapre and J. Gray. 2017. Hoplite: A deflection-routed directional torus NoC for FPGAs. ACM Trans. Reconfig. Technol. Syst. 10, 2 (2017), 14:1--14:24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. A. Kashif and M. A. S. Khalid. 2016. Experimental evaluation and comparison of time-multiplexed multi-FPGA routing architectures. In Proceedings of the IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS’16). 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  60. M. Katevenis. 2007. Interprocessor communication seen as load-store instruction generalization. In The Future of Computing, Essays in Memory of Stamatis Vassiliadis. K. Bertels (Editor), Delft, The Netherlands, 55--68.Google ScholarGoogle Scholar
  61. D. Koch. 2012. Partial Reconfiguration on FPGAs--Architectures, Tools and Applications. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. J. Laudon and D. Lenoski. 1997. The SGI origin: A ccNUMA highly scalable server. In Proceedings of the 24th International Symposium on Computer Architecture. 241--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. HiTech Global LLC. 2019. Xilinx/Altera FPGA boards, design services, IP Cores. Retrieved from http://www.hitechglobal.com/.Google ScholarGoogle Scholar
  64. G. Mahesh and S. M. Sakthivel. 2015. Verification of memory transactions in AXI protocol using system verilog approach. In Proceedings of the International Conference on Communications and Signal Processing (ICCSP’15). 0860--0864.Google ScholarGoogle Scholar
  65. N. Tarafdar et al. 2017. Enabling flexible network FPGA clusters in a heterogeneous cloud data center. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). 237--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. O. Pell and V. Averbukh. 2012. Maximum performance computing with dataflow engines. Comput. Sci. Eng. 14, 4 (2012), 98--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Oliver Pell and Oskar Mencer. 2011. Surviving the end of frequency scaling with reconfigurable dataflow computing. SIGARCH Comput. Archit. News 39, 4 (Dec. 2011), 60--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. C. Plessl. 2018. Bringing FPGAs to HPC production systems and codes. In Proceedings of the 4th International Workshop on Heterogeneous High-performance Reconfigurable Computing (workshop at Supercomputing).Google ScholarGoogle Scholar
  69. BERTEN Digital Signal Processing. 2016. GPU vs. FPGA Performance Comparison. Retrieved from http://www.bertendsp.com/pdf/whitepaper/BWP001_GPU_vs_FPGA_Performance_Comparison_v1.0.pdf.Google ScholarGoogle Scholar
  70. S. Ravi, K. Ezra, and H. Kittur. 2014. Design of a bus monitor for performance analysis of AXI protocol based SoC systems. Int. J. Appl. Eng. Res. 9 (Nov. 2014), 6313--6324.Google ScholarGoogle Scholar
  71. S. R. Pradeep. 2014. Design and verification environment for AMBA AXI protocol for SoC integration. Int. J. Res. Eng. Technol. 03 (May 2014), 338--343.Google ScholarGoogle Scholar
  72. Qingshan Tang. 2015. Methodology of Multi-FPGA Prototyping Platform Generation. Ph.D. Dissertation. Université Pierre et Marie Curie-Paris. Retrieved from https://tel.archives-ouvertes.fr/tel-01256510/document.Google ScholarGoogle Scholar
  73. Qingshan Tang and Matthieu Tuna. 2014. Performance comparison between multi-FPGA prototyping platforms: Hardwired off-the-shelf, cabling, and custom. 125--132. DOI:https://doi.org/10.1109/FCCM.2014.44 Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. top500.org 2019. Green500 List—November 2019. Retrieved from www.top500.org/green500/list/2019/11/.Google ScholarGoogle Scholar
  75. top500.org 2019. Top500 List—November 2019. Retrieved from www.top500.org/lists/2019/11/.Google ScholarGoogle Scholar
  76. A. Vaishnav, K. D. Pham, and D. Koch. 2018. A survey on FPGA virtualization. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18).Google ScholarGoogle Scholar
  77. C. Whitson and M. Michelsen. 1989. The negative flash. In Fluid Phase Equilibria, Vol. 35. 51--71.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. UNILOGIC: A Novel Architecture for Highly Parallel Reconfigurable Systems

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Reconfigurable Technology and Systems
              ACM Transactions on Reconfigurable Technology and Systems  Volume 13, Issue 4
              Special Section on FCCM 2019 and Regular Papers
              December 2020
              112 pages
              ISSN:1936-7406
              EISSN:1936-7414
              DOI:10.1145/3419942
              • Editor:
              • Deming Chen
              Issue’s Table of Contents

              Copyright © 2020 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 September 2020
              • Accepted: 1 June 2020
              • Revised: 1 March 2020
              • Received: 1 December 2019
              Published in trets Volume 13, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!