Abstract
The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level, and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present
- [1] . 2011. The gem5 simulator. ACM SIGARCH Comput. Archit. News 39, 2 (
Feb. 2011), 1–7.Google ScholarDigital Library
- [2] . 1973. Algorithm 457: Finding all cliques of an undirected graph. In Communications ACM, Vol. 9. 575–577.Google Scholar
- [3] . 2022. Early DSE and automatic generation of coarse grained merged accelerators. ACM Trans. Embed. Comput. Syst. (
June 2022).DOI: DOI: Google ScholarDigital Library
- [4] . 2016. Stratus High-Level Synthesis. Retrieved from https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/synthesis/stratus-high-level-synthesis.html.Google Scholar
- [5] . 2014. HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs. In Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). IEEE, 217–228.Google Scholar
Digital Library
- [6] . 2013. From software to accelerators with LegUp high-level synthesis. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems. IEEE.Google Scholar
- [7] . 2017. Catapult High-level Synthesis.. Retrieved from https://eda.sw.siemens.com/en-US/ic/ic-design/high-level-synthesis-and-verification-platform/.Google Scholar
- [8] David Durst, Matthew Feldman, Dillon Huff, David Akeley, Ross Daly, Gilbert Louis Bernstein, Marco Patrignani, Kayvon Fatahalian, and Pat Hanrahan. 2020. Type-directed scheduling of streaming accelerators. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery, New York, NY, 408–422. Google Scholar
Digital Library
- [9] . 2011. Dark silicon and the end of multicore scaling. In ACM SIGARCH Computer Architecture News, Vol. 39. 365–376.Google Scholar
- [10] . 2021. A graph deep learning framework for high-level synthesis design space exploration. arXiv preprint arXiv:2111.14767 (2021).Google Scholar
- [11] Lorenzo Ferretti, Andrea Cini, Georgios Zacharopoulos, Cesare Alippi, and Laura Pozzi. 2022. Graph neural networks for high-level synthesis design space exploration. ACM Trans. Des. Automat. Electron. Syst. 28, 2 (2022), 20. Google Scholar
Digital Library
- [12] . 2021. ILLIXR: Enabling end-to-end extended reality research. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). 24–38.
DOI: DOI: Google ScholarCross Ref
- [13] . 2018. Spatial: A language and compiler for application accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. 296–311.Google Scholar
Digital Library
- [14] . 2018. HPVM: Heterogeneous parallel virtual machine. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 68–80.Google Scholar
Digital Library
- [15] . 2016. Peruse and profit: Estimating the accelerability of loops. In Proceedings of the International Conference on Supercomputing. 1–13.Google Scholar
- [16] . 2019. HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 242–251.Google Scholar
Digital Library
- [17] . 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the 2nd International Symposium on Code Generation and Optimization. 75–88.Google Scholar
Cross Ref
- [18] LLVM Project. Circuit IR Compilers and Tools (CIRCT). https://github.com/llvm/circt.Google Scholar
- [19] . 2018. TAPAS: Generating parallel accelerators from parallel programs. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 245–257.Google Scholar
Digital Library
- [20] . 2012. An overview of today’s high-level synthesis tools. Des. Automat. Embed. Syst. 16, 3 (
Sept. 2012), 31–51.Google ScholarDigital Library
- [21] . 2019. HyperMapper: A practical design space exploration framework. In Proceedings of the IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 425–426.Google Scholar
Cross Ref
- [22] . 2016. FCUDA-SoC: Platform integration for field-programmable SoC with the CUDA-to-FPGA compiler. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays. 5–14.Google Scholar
Digital Library
- [23] . 2009. FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs. In Proceedings of the IEEE 7th Symposium on Application Specific Processors. IEEE, 35–42.Google Scholar
Cross Ref
- [24] . 2012. Bambu: A free framework for the high level synthesis of complex applications. In Proceedings of the 23rd International Conference on Field Programmable Logic and Applications.Google Scholar
- [25] . 2014. MachSuite: Benchmarks for accelerator design and customized architectures. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). IEEE, 110–119.Google Scholar
Cross Ref
- [26] . 2020. gem5-SALAM: A system architecture for LLVM-based accelerator modeling. In Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 471–482.Google Scholar
Cross Ref
- [27] . 2017. Tapir: Embedding fork-join parallelism into LLVM’s intermediate representation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 249–265.Google Scholar
Digital Library
- [28] . 2014. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In Proceedings of the 41st Annual International Symposium on Computer Architecture. IEEE, 97–108.Google Scholar
Digital Library
- [29] . 2016. Co-designing accelerators and SoC interfaces using gem5-aladdin. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–12.Google Scholar
Digital Library
- [30] . 2016. Moore’s law is dead. Now what? MIT Technol. Rev. May 13 (2016), 40–41.Google Scholar
- [31] . 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Cent. Reliab. High-perform. Comput. 127 (2012).Google Scholar
- [32] . 2017. Vivado High-level Synthesis. Retrieved from www.xilinx.com/products/design-tools/vivado/integration/esl-design.html.Google Scholar
- [33] . 2017. Xilinx All Programmable SoC portfolio. Retrieved from www.xilinx.com/products/silicon-devices/soc.html.Google Scholar
- [34] . yaoyuannnn. CAVA: Camera Vision Pipeline on gem5-Aladdin. https://github.com/yaoyuannnn/cava.Google Scholar
- [35] . 2018. Machine learning approach for loop unrolling factor prediction in high level synthesis. In Proceedings of the IEEE International Conference on High Performance Computing & Simulation (HPCS). 91–97.Google Scholar
- [36] . 2019. Compiler-assisted selection of hardware acceleration candidates from application source code. In Proceedings of the International Conference on Computer Design. 1–9.Google Scholar
- [37] . 2019. RegionSeeker: Automatically identifying and selecting accelerators from application source code. IEEE Trans. Comput.-aid Des. Integ. Circ. Syst. 38, 4 (
Apr. 2019), 741–754.Google ScholarCross Ref
- [38] . 2017. ClrFreqCFGPrinter: A Tool for Frequency Annotated Control Flow Graph Generation.
Technical Report . European LLVM Developers Meeting.Google Scholar - [39] . 2019. Janus: Statically-driven and profile-guided automatic dynamic binary parallelisation. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 15–25.Google Scholar
Cross Ref
Index Terms
Trireme: Exploration of Hierarchical Multi-level Parallelism for Hardware Acceleration
Recommendations
Design space exploration in multi-level computing systems
CompSysTech '14: Proceedings of the 15th International Conference on Computer Systems and TechnologiesThe paper is dedicated to the design space exploration for Xilinx devices from Zynq-7000 family with such architecture that includes a dual-core processing system and a programmable logic on the same microchip. The developed multi-level computing system ...
Fingerprint image processing acceleration through run-time reconfigurable hardware
To the best of the authors' knowledge, this is the first brief that implements a complete automatic fingerprint-based authentication system (AFAS) application under a dynamically partial self-reconfigurable field-programmable gate array (FPGA). The main ...
A unified model for co-simulation and co-synthesis of mixed hardware/software systems
EDTC '95: Proceedings of the 1995 European conference on Design and TestThis paper presents a methodology for a unified co-simulation and co-synthesis of hardware-software systems. This approach addresses the modeling of communication between the hardware and software modules at different abstraction levels and for ...






Comments