Abstract
Fast execution of physical system models has various uses, such as simulating physical phenomena or real-time testing of medical equipment. Physical system models commonly consist of thousands of differential equations. Solving such equations using software on microprocessor devices may be slow. Several past efforts implement such models as parallel circuits on special computing devices called Field-Programmable Gate Arrays (FPGAs), demonstrating large speedups due to the excellent match between the massive fine-grained local communication parallelism common in physical models and the fine-grained parallel compute elements and local connectivity of FPGAs. However, past implementation efforts were mostly manual or ad hoc. We present the first method for automatically converting a set of ordinary differential equations into circuits on FPGAs. The method uses a general Processing Element (PE) that we developed, designed to quickly solve a set of ordinary differential equations while using few FPGA resources. The method instantiates a network of general PEs, partitions equations among the PEs to minimize communication, generates each PE's custom program, creates custom connections among PEs, and maintains synchronization of all PEs in the network. Our experiments show that the method generates a 400-PE network on a commercial FPGA that executes four different models on average 15x faster than a 3 GHz Intel processor, 30x faster than a commercial 4-core ARM, 14x faster than a commercial 6-core Texas Instruments digital signal processor, and 4.4x faster than an NVIDIA 336-core graphics processing unit. We also show that the FPGA-based approach is reasonably cost effective compared to using the other platforms. The method yields 2.1x faster circuits than a commercial high-level synthesis tool that uses the traditional method for converting behavior to circuits, while using 2x fewer lookup tables, 2x fewer hardcore multiplier (DSP) units, though 3.5x more block RAM due to being programmable. Furthermore, the method does not just generate a single fastest design, but generates a range of designs that trade off size and performance, by using different numbers of PEs.
- Ackermann, J., Baecher, P., Franzel, T., Goesele, M., and Hamacher, K. 2009. Massively-parallel simulation of biochemical systems. In Proceedings of the Massively Parallel Computational Biology on GPUs Conference. Jahrestagung der Gesellschaft fÃOEr Informatik e.V.Google Scholar
- Advanced Micro Devices (AMD). 2011. AMD opteron. http://www.amd.com/usen/Processors/Product Information/0,30_118_8825,00.html.Google Scholar
- Agarwal, A., Sites, R., and Horwitz, M. 1986. ATUM a new technique for capturing address traces using microcode. In Proceedings of the 13th International Symposium on Computer Architecture. Google Scholar
Digital Library
- Amorim, R. M., Rocha, B. M., Campos, F. O., and dos Santos, R. W. 2010. Automatic code generation for solvers of cardiac cellular membrane dynamics in gpus. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'10). 2666--2669.Google Scholar
Cross Ref
- Andreev, K. and Racke, H. 2006. Balanced graph partitioning. Theor. Comput. Syst. 39, 6, 929--939. Google Scholar
Digital Library
- ARM RISC. 2001. http://www.arm.com/products/processors/technologies/instruction-set-architectures.php.Google Scholar
- ASPX TI CCS. 2011. http://focus.ti.com/docs/toolsw/folders/print/ccstudio.html.Google Scholar
- Atkinson, K. 1993. Elementary. Numerical Analysis 2nd Ed. John Wiley and Sons, New York.Google Scholar
- ATI Graphics Cards. 2011. http://ati.amd.com/support/driver.html.Google Scholar
- Barbini, P., Brighenti, C., Cevenini, G., and Gnudi, G. 2005. A dynamic morphometric model of the normal lung for studying expiratory flow limitation in mechanical ventilation. Ann. Biomed. Engin 33, 4, 518--530.Google Scholar
Cross Ref
- Butcher, J. C. 2003. Numerical Methods for Ordinary Differential Equations. Wiley.Google Scholar
- Buyukkurt, B. A., Guo, Z., and Najjar, W. 2006. Impact of loop unrolling on throughput, area and clock frequency in ROCCC: C to VHDL compiler for FPGAs. In Proceedings of the International Workshop on Applied Reconfigurable Computing.Google Scholar
- CellMl. 2011. http://www.cellml.org.Google Scholar
- Celoxica. 2011. http://www.celoxica.com/.Google Scholar
- Chen, H., Sun, S., Aliprantis, D., and Zambrena, J. 2009. Dynamic simulation of electric machines on FPGA boards. In Proceedings of the International Electric Machines and Drives Conference.Google Scholar
- Cong, J., Fan, Y., Han, G., Jiang, W., and Zhang, Z. 2008. Platform-based behavior-level and system-level synthesis. In Proceedings of the IEEE International SOC Conference. 199--202.Google Scholar
- Cray. 2011. http://www.cray.com/Home.aspx.Google Scholar
- CUDA 2011. http://developer.nvidia.com/cuda-downloads.Google Scholar
- CUDA Programming Guide. 2011. http://developer.download.nvidia.com/compute/cuda/4_0-/toolkit/docs/CUDA_C_Programming_Guide.pdf.Google Scholar
- Diniz, P., Hall Park, M., Park, J., So, B., and Ziegler, H. 2001. Bridging the gap between compilation and synthesis in the defacto system. In Proceedings of the 14th Workshop on Languages and Compilers for Parallel Computing Synthesis (LCPC'01). Google Scholar
Digital Library
- Gholkar, A., Lsaacs, A., and Hemendra, A. 2002. Hardware-in-loop simulator for mini aerial vehicle. In Proceedings of the 6th Real-Time Linux Workshop.Google Scholar
- Gokhale, M. B., Stone, J. M., Arnold, J., and Lalinowski, M. 2000. Stream-oriented FPGA computing in the streams-C high level language. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM'00). Google Scholar
Digital Library
- Heart Simulator. 2011. http://www.columbia.edu/itc/hs/medical/heartsim/.Google Scholar
- Hong, S. and Kim, H. 2009. An analytical model for GPU architecture with memory-level and thread-level parallelism awareness. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- Huang, C., Vahid, F., and Givargis, T. A. 2011. Custom FPGA processor for physical model ordinary differential equation solving. IEEE Embedd. Syst. Lett. 3, 4, 113--116. Google Scholar
Digital Library
- Hucka, M., Finney, A., Bornstein, B., Keating, S., Shapiro, B. Matthews, J. Kovitz, B., Schilstra, M., Funahashi, A., Doyle, J., and Kitano, H. 2004. Evolving a lingua franca and associated software infrastructure for computational systems biology: The systems biology markup language (SBML) project. IEEE Syst. Biol. 1, 1, 41--53.Google Scholar
Cross Ref
- IBM Blue Gene. 2011. Supercomputer. http://domino.research.ibm.com/comm/research_projects.nsf/pages/bluegene.index.html.Google Scholar
- Intel 64. 2011. http://www.intel.com/technology/intel64/index.htm.Google Scholar
- Intel Corporation. 2011. Multicore technology. http://www.intel.com/multi-core/.Google Scholar
- Iwanaga, N., Shibata, Y., Yoshimi, M., Osana, Y., Iwaoka, Y., et al. 2005. Efficient scheduling of rate law functions for ODE-basedmultimodel biochemical simulation on an FPGA. In Proceedings of the International Conference on Field Programmable Logic and Applications. 666--669.Google Scholar
- JSIM. 2011. http://nsr.bioeng.washington.edu/jsim/.Google Scholar
- Kernighan, B. W. and Lin, S. 1970. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49, 291--307.Google Scholar
Cross Ref
- Kum, K., Kang, J., and Sung, W. 2000. AUTOSCALER for C: An optimizing floating-point to integer C program converter for fixed-point digital signal processors. IEEE Trans. Analog Digital Signal Process. 47, 9, 840--848.Google Scholar
Cross Ref
- Lee, E. A. 2008. Cyber physical systems: Design challenges. Tech. rep. UCB/EECS-2008-8, University of California, EECS Department.Google Scholar
- Lin, C. L., Tawhai, M. H., McLennan, G., and Hoffman, E. A. 2009. Multiscale simulation of gas flow in subject-Specific models of the human lung. IEEE Engin Med. Biol. 28, 3, 25--33.Google Scholar
Cross Ref
- Lionetti, F. 2010. http://cseweb.ucsd.edu/groups/hpcl/scg/papers/2010/lionetti_ms_thesis.pdf.Google Scholar
- Lutchen, F. P., Primiano, J. R., and Saidel, G. M. 1982. A nonlinear model combining pulmonary mechanics and gas concentration dynamics. IEEE Trans. Biomed. Engin. 29, 629--641.Google Scholar
Cross Ref
- Mathematica. 2011. http://www.wolfram.com/.Google Scholar
- Mathworks. 2011. Matlab and simulink. http://www.mathworks.com/.Google Scholar
- MedGadget. 2008. Supercomputer creates most advanced heart model. Int. J. Emerging Med. Technol. Jan. 2008.Google Scholar
- McFarland, M. C., Parker, A. C., and Camposano, R. 1990. The high level synthesis of digital systems. Proc IEEE 78, 301--318.Google Scholar
Cross Ref
- MicroBlaze. 2011. http://www.xilinx.com/tools/microblaze.htm.Google Scholar
- Miller, B., Givargis, T., and Vahid, F. 2011. Application-specific codesign platform generation for digital mockups in cyber-physical systems. In Proceedings of the IEEE Electronic System Level Synthesis Conference (ESLsyn'11).Google Scholar
- Mosegaard, J. and Sørensen, T. S. 2005. Real-time deformation of detailed geometry based on mappings to a less detailed physical simulation on the GPU. In Proceedings of the Eurographics Virtual Environments Workshop. 105--110. Google Scholar
Digital Library
- Motuk, E., Woods, R., and Bilbao, S. 2005. Implementation of finite difference schemes for the wave equation on FPGA. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05).Google Scholar
- Najjar, W., Bohm, W., Draper, B., Hammes, J., Rinker, R., Beveridge, R., Chawathe, M., and Ross, C. 2003. From algorithms to hardware - A high-level language abstraction for reconfigurable computing. Comput. 36, 8. Google Scholar
Digital Library
- National Instruments. 2011. LabView FPGA module. http://www.ni.com/fpga/.Google Scholar
- Nsr Physiome Project. 2011. Mathematical markup language. http://nsr.bioeng.washington.edu/jsim/docs/MML_Intro.html.Google Scholar
- Nvidia Corporation. 2011. http://www.nvidia.com/object/gpu.html.Google Scholar
- Osana, Y., Fukushima, T., and Amano, H. 2004. ReCSiP: A reconfigurable cell simulation platform: Accelerating biological applications with FPGA. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'04). Google Scholar
Digital Library
- Pimentel, J. and Tirat-Gefen, Y. 2006. Hardware acceleration for real time simulation of physiological systems. In Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS'06). 218--223.Google Scholar
- Reshadi, M., Gorjiara, B., and Gajski, D. 2005. Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths. In Proceedings of the International Conference on Computer Design. Google Scholar
Digital Library
- Salwinski, L. and Eisenberg, D. 2004. Silico Simulation of Biological Network Dynamics. Nature Publishing Group, 1017--1019.Google Scholar
- Simulink. 2001. http://www.mathworks.com/products/simulink/.Google Scholar
- Spark Project. 2005. http://mesl.ucsd.edu/spark/.Google Scholar
- Synphonyc. 2011. http://www.synopsys.com/Systems/BlockDesign/HLS/Pages/SynphonyCCompiler.Google Scholar
- VHDL. 2011. http://www.vhdl.org.Google Scholar
- Weibel, E. R. 1963. Morphometry of the Human Lung. Springer.Google Scholar
- Xilinx Ise. 2011. http://www.xilinx.com/support/documentation/dt_ise12-4.htm.Google Scholar
- Yoshimi, M., Osana, Y., Fukushima, T., and Amano, H. 2004. Stochastic simulation for biochemical reactions on FPGA. In Proceedings of the 14th International Conference on Field Programmable Logic and Application (FPL'04). 05--114.Google Scholar
- Zhang, H., Holden, A. V., and Boyett, M. R. 2001. Gradient model versus mosaic model of the sinoatrial node. Circulat. 103, 584--588.Google Scholar
Cross Ref
Index Terms
Automatic synthesis of physical system differential equation models to a custom network of general processing elements on FPGAs
Recommendations
FPGA prototyping of a RISC processor core for embedded applications
Application-specific processors offer an attractive option in the design of embedded systems by providing high performance for a specific application domain. In this work, we describe the use of a reconfigurable processor core based on an RISC ...
Synthesis of networks of custom processing elements for real-time physical system emulation
Emulating a physical system in real-time or faster has numerous applications in cyber-physical system design and deployment. For example, testing of a cyber-device's software (e.g., a medical ventilator) can be done via interaction with a real-time ...
Embedding-based placement of processing element networks on FPGAs for physical model simulation
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysPhysical models utilize mathematical equations to model physical systems like airway mechanics, neuron networks, or chemical reactions. Previous work has shown that physical models can execute fast on FPGAs (field-programmable gate arrays). We introduce ...






Comments