Abstract
Dynamic biomedical systems are mathematically described by Ordinary Differential Equations (ODEs) and their solution is often one of the most computationally intensive parts in biomedical simulations. With high inherent parallelism, hardware acceleration based on Field-Programmable Gate Arrays (FPGAs) has great potential to increase the computational performance of the model simulations, while being very power-efficient. However, the manual hardware implementation is complex and time consuming. The advantages of FPGA designs can only be realised if there is a general solution to automate the process. In this article, we propose a domain-specific high-level synthesis tool called ODoST that automatically generates an FPGA-based Hardware Accelerator Module (HAM) from a high-level description. In this direct approach, ODE equations are directly mapped to processing pipelines without any intermediate architecture layer of processing elements. We evaluate the generated HAMs on real hardware based on their resource usage, processing speed, and power consumption, and compare them with CPUs and a GPU. The results show that FPGA implementations can achieve 15.3 times more speedup compared to a single core CPU solution and perform similarly to an auto-generated GPU solution, while the FPGA implementations can achieve 14.5 times more power efficiency than the CPU and 3.1 times compared to the optimised GPU solution. Improved speedups are foreseeable based on further optimisations.
- Altera. 2010. Stratix V FPGAs: Built for Bandwidth. Retrieved from http://www.altera.com/devices/fpga/stratix-fpgas/stratix-v/stxv-index.jsp.Google Scholar
- Altera. 2011. DE4 Development and Education Board. Retrieved from http://www.altera.com/education/univ/materials/boards/de4/unv-de4-board.html.Google Scholar
- Altera. 2013. Stratix 10 FPGAs and SoCs. Retrieved from http://www.altera.com/devices/fpga/stratix-fpgas/stratix10/stx10-index.jsp.Google Scholar
- Altera. 2014. Qsys - Altera’s System Integration Tool. Retrieved from http://www.altera.com/products/software/quartus-ii/subscription-edition/qsys/qts-qsys.html.Google Scholar
- James Bassingthwaighte, Erik Butterworth, Bart Jardine, Gary Raymond, and Maxwell Neal. 2014. JSim, an open-source modeling system for data analysis and reproducibility in research (733.1). The FASEB Journal 28, 1 Supplement (April 2014), 733.1.Google Scholar
- G. W. Beeler and H. Reuter. 1977. Reconstruction of the action potential of ventricular myocardial fibres. The Journal of Physiology 268, 1 (June 1977), 177--210.Google Scholar
Cross Ref
- B. Betkaoui, D. B. Thomas, and W. Luk. 2010. Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing. In 2010 International Conference on Field-Programmable Technology (FPT). 94--101.Google Scholar
- Matthias Birk, Michael Zapf, Matthias Balzer, Nicole Ruiter, and Jürgen Becker. 2014. A comprehensive comparison of GPU-and FPGA-based acceleration of reflection image reconstruction for 3D ultrasound computer tomography. Journal of Real-Time Image Processing 9, 1 (2014), 159--170. Google Scholar
Digital Library
- Chris Bradley, Andy Bowery, Randall Britten, Vincent Budelmann, Oscar Camara, Richard Christie, Andrew Cookson, Alejandro F. Frangi, Thiranja Babarenda Gamage, Thomas Heidlauf, and others. 2011. OpenCMISS: A multi-physics & multi-scale computational infrastructure for the VPH/Physiome project. Progress in Biophysics and Molecular Biology 107, 1 (2011), 32--47.Google Scholar
Cross Ref
- Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’11). ACM, New York, NY, 33--36. Google Scholar
Digital Library
- D. Chen and D. Singh. 2012. Invited paper: Using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAS for information filtering. In 2012 22nd International Conference on Field Programmable Logic and Applications (FPL). 5--12.Google Scholar
- Hao Chen, Song Sun, D. C. Aliprantis, and J. Zambreno. 2009. Dynamic simulation of electric machines on FPGA boards. In Electric Machines and Drives Conference, 2009. IEMDC’09. IEEE International. 1523--1528.Google Scholar
- J. Cong, Yiping Fan, G. Han, Wei Jiang, and Zhiru Zhang. 2006. Platform-based behavior-level and system-level synthesis. In Conference, 2006 IEEE International SOC Conference. 199--202.Google Scholar
Cross Ref
- Philippe Coussy, Cyrille Chavet, Pierre Bomel, Dominique Heller, Eric Senn, and Eric Martin. 2008. GAUT: A high-level synthesis tool for DSP applications. In High-Level Synthesis, Philippe Coussy and Adam Morawiec (Eds.). Springer, Netherlands, 147--169.Google Scholar
- Autumn A. Cuellar, Catherine M. Lloyd, Poul F. Nielsen, David P. Bullivant, David P. Nickerson, and Peter J. Hunter. 2003. An overview of CellML 1.1, a biological model description language. Simulation 79, 12 (2003), 740--747.Google Scholar
Cross Ref
- Christopher Cullinan, Christopher Wyant, Timothy Frattesi, and Xinming Huang. 2012. Computing performance benchmarks among CPU, GPU, and FPGA. E-project-030212-123508 (2012).Google Scholar
- Florent de Dinechin and Bogdan Pasca. 2010. FloPoCo, a generator of arithmetic cores for FPGAs. (2010).Google Scholar
- J. C. G. de Pimentel and Y. G. Tirat-Gefen. 2006. Hardware acceleration for real time simulation of physiological systems. In 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2006. EMBS’06. 218--223.Google Scholar
- Grune Dick and H Ceriel. 1990. Parsing Techniques, a Practical Guide. Technical Report. Technical Report.Google Scholar
- Edsger Wybe Dijkstra. 1961. ALGOL-60 Translation. Mathematisch Centrum.Google Scholar
- Alan Garny, David P. Nickerson, Jonathan Cooper, Rodrigo Weber dos Santos, Andrew K. Miller, Steve McKeever, Poul M. F. Nielsen, and Peter J. Hunter. 2008. CellML and associated tools and techniques. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 366, 1878 (2008), 3017--3043.Google Scholar
Cross Ref
- S. Gupta, N. Dutt, R. Gupta, and A. Nicolau. 2003. SPARK: A high-level synthesis framework for applying parallelizing compiler transformations. In 16th International Conference on VLSI Design, 2003. 461--466. Google Scholar
Digital Library
- D. W. Hilgemann and D. Noble. 1987. Excitation-contraction coupling and extracellular calcium transients in rabbit atrium: Reconstruction of basic cellular mechanisms. Proceedings of the Royal Society of London. Series B, Containing Papers of a Biological Character. Royal Society (Great Britain) 230, 1259 (March 1987), 163--205.Google Scholar
- A. L. Hodgkin and A. F. Huxley. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology 117, 4 (Aug. 1952), 500--544.Google Scholar
Cross Ref
- Chen Huang, Frank Vahid, and Tony Givargis. 2013. Automatic synthesis of physical system differential equation models to a custom network of general processing elements on FPGAs. ACM Transactions on Embedded Computing Systems (TECS) 13, 2 (2013), 23. Google Scholar
Digital Library
- Michael Hucka, Andrew Finney, Herbert M. Sauro, Hamid Bolouri, John C. Doyle, Hiroaki Kitano, Adam P. Arkin, Benjamin J. Bornstein, Dennis Bray, Athel Cornish-Bowden, and others. 2003. The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics 19, 4 (2003), 524--531.Google Scholar
Cross Ref
- Intel. 2012. Intel Xeon Processor E5-4650 (20M Cache, 2.70 GHz, 8.00 GT/s Intel QPI). Retrieved from http://ark.intel.com/products/64622/Intel-Xeon-Processor-E5-4650-20M-Cache-2_70-GHz-8_00-GTs-Intel-QPI.Google Scholar
- Intel. 2014. Optimizing Performance with Intel Advanced Vector Extensions. Retrieved from http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf.Google Scholar
- David Huw Jones, Adam Powell, C. Bouganis, and Peter Y. K. Cheung. 2010. GPU versus FPGA for high productivity computing. In 2010 International Conference on Field Programmable Logic and Applications (FPL). IEEE, 119--124. Google Scholar
Digital Library
- S. Kestur, J. D. Davis, and O. Williams. 2010. BLAS comparison on FPGA, CPU and GPU. In 2010 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 288--293. Google Scholar
Digital Library
- Mathworks. 2014. Matlab and Simulink. Retrieved from http://www.mathworks.com.Google Scholar
- Andrew K. Miller, Justin Marsh, Adam Reeve, Alan Garny, Randall Britten, Matt Halstead, Jonathan Cooper, David P. Nickerson, and Poul F. Nielsen. 2010. An overview of the CellML API and its implementation. BMC Bioinformatics 11, 1 (2010), 178.Google Scholar
Cross Ref
- Bailey Miller, Frank Vahid, Tony Givargis, and Philip Brisk. 2014. Graph-based approaches to placement of processing element networks on FPGAs for physical model simulation. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 7, 4 (2014), 22. Google Scholar
Digital Library
- National instruments. 2014. NI LabView. http://www.ni.com/labview/.Google Scholar
- NSR Physiome Project. 2012. Mathematical Markup Language. Retrieved from http://nsr.bioeng.washington.edu/jsim/docs/MML_Intro.html.Google Scholar
- NVIDIA UK. 2011. Tesla C2050 /C2070 GPU Computing Processor. Retrieved from http://www.nvidia.co.uk/object/product_tesla_C2050_C2070_uk.html.Google Scholar
- T. Okuyama, M. Okita, T. Abe, Y. Asai, H. Kitano, T. Nomura, and K. Hagihara. 2014. Accelerating ODE-based simulation of general and heterogeneous biophysical models using a GPU. IEEE Transactions on Parallel and Distributed Systems 25, 8 (Aug. 2014), 1966--1975.Google Scholar
Cross Ref
- Julian Oppermann, Andreas Koch, Ting Yu, and Oliver Sinnen. 2015. Domain-specific optimisation for the high-level synthesis of CellML-based simulation accelerators. In Field Programmable Logic and Applications (FPL), 2015 25th International Conference on. IEEE, 1--7.Google Scholar
Cross Ref
- Yasunori Osana, Masato Yoshimi, Yow Iwaoka, Toshinori Kojima, Yuri Nishikawa, Akira Funahashi, Noriko Hiroi, Yuichiro Shibata, Naoki Iwanaga, Hiroaki Kitano, and Hideharu Amano. 2007. ReCSiP: An FPGA-based general-purpose biochemical simulator. Electronics and Communications in Japan (Part II: Electronics) 90, 7 (July 2007), 1--10.Google Scholar
Cross Ref
- John K. Ousterhout. 1989. Tcl: An Embeddable Command Language. Citeseer.Google Scholar
Digital Library
- Joe Pitt-Francis, Pras Pathmanathan, Miguel O. Bernabeu, Rafel Bordas, Jonathan Cooper, Alexander G. Fletcher, Gary R. Mirams, Philip Murray, James M. Osborne, Alex Walter, and others. 2009. Chaste: A test-driven approach to software development for biological modelling. Computer Physics Communications 180, 12 (2009), 2452--2471.Google Scholar
Cross Ref
- Ravikesh Chandra. 2013. Novel Approaches to Automatic Hardware Acceleration of High-Level Software.Google Scholar
- Diana C. Resasco, Fei Gao, Frank Morgan, Igor L. Novak, James C. Schaff, and Boris M. Slepchenko. 2012. Virtual cell: Computational tools for modeling in cell biology. Wiley Interdisciplinary Reviews: Systems Biology and Medicine 4, 2 (2012), 129--140.Google Scholar
Cross Ref
- Armin Ronacher. 2011. Jinja2 (The Python Template Engine).Google Scholar
- Shubhranshu. 2011. Integrating SED-ML and CellML Models on the GPU. Technical Report.Google Scholar
- D. B. Thomas and H. Amano. 2013. A fully pipelined FPGA architecture for stochastic simulation of chemical systems. In 2013 23rd International Conference on Field Programmable Logic and Applications (FPL). 1--7.Google Scholar
- TOP500. 2014. TOP500 Supercomputer. Retrieved from http://www.top500.org/project/.Google Scholar
- Josh Umbehr. 2008. Supercomputer Creates Most Advanced Heart Model. Retrieved from http://www.medgadget.com/2008/01/worlds_biggest_heart_model_simulated_1.html.Google Scholar
- Wolfram. 2014. Wolfram Mathematica: Definitive System for Modern Technical Computing. Retrieved from http://www.wolfram.com/mathematica.Google Scholar
- M. Yoshimi, Y. Osana, T. Fukushima, and H. Amano. 2004. Stochastic simulation for biochemical reactions on FPGA. In Field Programmable Logic and Application, JÁŒrgen Becker, Marco Platzner, and Serge Vernalde (Eds.). Number 3203 in Lecture Notes in Computer Science. Springer, Berlin, 105--114.Google Scholar
- Ting Yu, Chris Bradley, and Oliver Sinnen. 2013. Hardware acceleration of biomedical models with OpenCMISS and CellML. In 2013 International Conference on Field-Programmable Technology (FPT). IEEE, 370--373.Google Scholar
Cross Ref
- Ting Yu, Julian Oppermann, Chris Bradley, and Oliver Sinnen. 2016. Performance optimisation strategies for automatically generated FPGA accelerators for biomedical models. Concurrency and Computation: Practice and Experience 28, 5 (April 2016), 1480--1506. Google Scholar
Digital Library
Index Terms
ODoST: Automatic Hardware Acceleration for Biomedical Model Integration
Recommendations
From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded SystemsEmbedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
A Unified FPGA-Based System Architecture for 2-D Discrete Wavelet Transform
This paper presents a novel unified and programmable 2-D Discrete Wavelet Transform (DWT) system architecture, which was implemented using a Field Programmable Gate Array (FPGA)-based Nios II soft-core processor working in combination with custom ...
Increasing hardware efficiency with multifunction loop accelerators
CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesisTo meet the conflicting goals of high-performance low-cost embedded systems, critical application loop nests are commonly executed on specialized hardware accelerators. These loop accelerators are traditionally designed in a single-function manner, ...






Comments