skip to main content
research-article

Automatic synthesis of physical system differential equation models to a custom network of general processing elements on FPGAs

Published:30 September 2013Publication History
Skip Abstract Section

Abstract

Fast execution of physical system models has various uses, such as simulating physical phenomena or real-time testing of medical equipment. Physical system models commonly consist of thousands of differential equations. Solving such equations using software on microprocessor devices may be slow. Several past efforts implement such models as parallel circuits on special computing devices called Field-Programmable Gate Arrays (FPGAs), demonstrating large speedups due to the excellent match between the massive fine-grained local communication parallelism common in physical models and the fine-grained parallel compute elements and local connectivity of FPGAs. However, past implementation efforts were mostly manual or ad hoc. We present the first method for automatically converting a set of ordinary differential equations into circuits on FPGAs. The method uses a general Processing Element (PE) that we developed, designed to quickly solve a set of ordinary differential equations while using few FPGA resources. The method instantiates a network of general PEs, partitions equations among the PEs to minimize communication, generates each PE's custom program, creates custom connections among PEs, and maintains synchronization of all PEs in the network. Our experiments show that the method generates a 400-PE network on a commercial FPGA that executes four different models on average 15x faster than a 3 GHz Intel processor, 30x faster than a commercial 4-core ARM, 14x faster than a commercial 6-core Texas Instruments digital signal processor, and 4.4x faster than an NVIDIA 336-core graphics processing unit. We also show that the FPGA-based approach is reasonably cost effective compared to using the other platforms. The method yields 2.1x faster circuits than a commercial high-level synthesis tool that uses the traditional method for converting behavior to circuits, while using 2x fewer lookup tables, 2x fewer hardcore multiplier (DSP) units, though 3.5x more block RAM due to being programmable. Furthermore, the method does not just generate a single fastest design, but generates a range of designs that trade off size and performance, by using different numbers of PEs.

References

  1. Ackermann, J., Baecher, P., Franzel, T., Goesele, M., and Hamacher, K. 2009. Massively-parallel simulation of biochemical systems. In Proceedings of the Massively Parallel Computational Biology on GPUs Conference. Jahrestagung der Gesellschaft fÃOEr Informatik e.V.Google ScholarGoogle Scholar
  2. Advanced Micro Devices (AMD). 2011. AMD opteron. http://www.amd.com/usen/Processors/Product Information/0,30_118_8825,00.html.Google ScholarGoogle Scholar
  3. Agarwal, A., Sites, R., and Horwitz, M. 1986. ATUM a new technique for capturing address traces using microcode. In Proceedings of the 13th International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amorim, R. M., Rocha, B. M., Campos, F. O., and dos Santos, R. W. 2010. Automatic code generation for solvers of cardiac cellular membrane dynamics in gpus. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'10). 2666--2669.Google ScholarGoogle ScholarCross RefCross Ref
  5. Andreev, K. and Racke, H. 2006. Balanced graph partitioning. Theor. Comput. Syst. 39, 6, 929--939. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ARM RISC. 2001. http://www.arm.com/products/processors/technologies/instruction-set-architectures.php.Google ScholarGoogle Scholar
  7. ASPX TI CCS. 2011. http://focus.ti.com/docs/toolsw/folders/print/ccstudio.html.Google ScholarGoogle Scholar
  8. Atkinson, K. 1993. Elementary. Numerical Analysis 2nd Ed. John Wiley and Sons, New York.Google ScholarGoogle Scholar
  9. ATI Graphics Cards. 2011. http://ati.amd.com/support/driver.html.Google ScholarGoogle Scholar
  10. Barbini, P., Brighenti, C., Cevenini, G., and Gnudi, G. 2005. A dynamic morphometric model of the normal lung for studying expiratory flow limitation in mechanical ventilation. Ann. Biomed. Engin 33, 4, 518--530.Google ScholarGoogle ScholarCross RefCross Ref
  11. Butcher, J. C. 2003. Numerical Methods for Ordinary Differential Equations. Wiley.Google ScholarGoogle Scholar
  12. Buyukkurt, B. A., Guo, Z., and Najjar, W. 2006. Impact of loop unrolling on throughput, area and clock frequency in ROCCC: C to VHDL compiler for FPGAs. In Proceedings of the International Workshop on Applied Reconfigurable Computing.Google ScholarGoogle Scholar
  13. CellMl. 2011. http://www.cellml.org.Google ScholarGoogle Scholar
  14. Celoxica. 2011. http://www.celoxica.com/.Google ScholarGoogle Scholar
  15. Chen, H., Sun, S., Aliprantis, D., and Zambrena, J. 2009. Dynamic simulation of electric machines on FPGA boards. In Proceedings of the International Electric Machines and Drives Conference.Google ScholarGoogle Scholar
  16. Cong, J., Fan, Y., Han, G., Jiang, W., and Zhang, Z. 2008. Platform-based behavior-level and system-level synthesis. In Proceedings of the IEEE International SOC Conference. 199--202.Google ScholarGoogle Scholar
  17. Cray. 2011. http://www.cray.com/Home.aspx.Google ScholarGoogle Scholar
  18. CUDA 2011. http://developer.nvidia.com/cuda-downloads.Google ScholarGoogle Scholar
  19. CUDA Programming Guide. 2011. http://developer.download.nvidia.com/compute/cuda/4_0-/toolkit/docs/CUDA_C_Programming_Guide.pdf.Google ScholarGoogle Scholar
  20. Diniz, P., Hall Park, M., Park, J., So, B., and Ziegler, H. 2001. Bridging the gap between compilation and synthesis in the defacto system. In Proceedings of the 14th Workshop on Languages and Compilers for Parallel Computing Synthesis (LCPC'01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gholkar, A., Lsaacs, A., and Hemendra, A. 2002. Hardware-in-loop simulator for mini aerial vehicle. In Proceedings of the 6th Real-Time Linux Workshop.Google ScholarGoogle Scholar
  22. Gokhale, M. B., Stone, J. M., Arnold, J., and Lalinowski, M. 2000. Stream-oriented FPGA computing in the streams-C high level language. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM'00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Heart Simulator. 2011. http://www.columbia.edu/itc/hs/medical/heartsim/.Google ScholarGoogle Scholar
  24. Hong, S. and Kim, H. 2009. An analytical model for GPU architecture with memory-level and thread-level parallelism awareness. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Huang, C., Vahid, F., and Givargis, T. A. 2011. Custom FPGA processor for physical model ordinary differential equation solving. IEEE Embedd. Syst. Lett. 3, 4, 113--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hucka, M., Finney, A., Bornstein, B., Keating, S., Shapiro, B. Matthews, J. Kovitz, B., Schilstra, M., Funahashi, A., Doyle, J., and Kitano, H. 2004. Evolving a lingua franca and associated software infrastructure for computational systems biology: The systems biology markup language (SBML) project. IEEE Syst. Biol. 1, 1, 41--53.Google ScholarGoogle ScholarCross RefCross Ref
  27. IBM Blue Gene. 2011. Supercomputer. http://domino.research.ibm.com/comm/research_projects.nsf/pages/bluegene.index.html.Google ScholarGoogle Scholar
  28. Intel 64. 2011. http://www.intel.com/technology/intel64/index.htm.Google ScholarGoogle Scholar
  29. Intel Corporation. 2011. Multicore technology. http://www.intel.com/multi-core/.Google ScholarGoogle Scholar
  30. Iwanaga, N., Shibata, Y., Yoshimi, M., Osana, Y., Iwaoka, Y., et al. 2005. Efficient scheduling of rate law functions for ODE-basedmultimodel biochemical simulation on an FPGA. In Proceedings of the International Conference on Field Programmable Logic and Applications. 666--669.Google ScholarGoogle Scholar
  31. JSIM. 2011. http://nsr.bioeng.washington.edu/jsim/.Google ScholarGoogle Scholar
  32. Kernighan, B. W. and Lin, S. 1970. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49, 291--307.Google ScholarGoogle ScholarCross RefCross Ref
  33. Kum, K., Kang, J., and Sung, W. 2000. AUTOSCALER for C: An optimizing floating-point to integer C program converter for fixed-point digital signal processors. IEEE Trans. Analog Digital Signal Process. 47, 9, 840--848.Google ScholarGoogle ScholarCross RefCross Ref
  34. Lee, E. A. 2008. Cyber physical systems: Design challenges. Tech. rep. UCB/EECS-2008-8, University of California, EECS Department.Google ScholarGoogle Scholar
  35. Lin, C. L., Tawhai, M. H., McLennan, G., and Hoffman, E. A. 2009. Multiscale simulation of gas flow in subject-Specific models of the human lung. IEEE Engin Med. Biol. 28, 3, 25--33.Google ScholarGoogle ScholarCross RefCross Ref
  36. Lionetti, F. 2010. http://cseweb.ucsd.edu/groups/hpcl/scg/papers/2010/lionetti_ms_thesis.pdf.Google ScholarGoogle Scholar
  37. Lutchen, F. P., Primiano, J. R., and Saidel, G. M. 1982. A nonlinear model combining pulmonary mechanics and gas concentration dynamics. IEEE Trans. Biomed. Engin. 29, 629--641.Google ScholarGoogle ScholarCross RefCross Ref
  38. Mathematica. 2011. http://www.wolfram.com/.Google ScholarGoogle Scholar
  39. Mathworks. 2011. Matlab and simulink. http://www.mathworks.com/.Google ScholarGoogle Scholar
  40. MedGadget. 2008. Supercomputer creates most advanced heart model. Int. J. Emerging Med. Technol. Jan. 2008.Google ScholarGoogle Scholar
  41. McFarland, M. C., Parker, A. C., and Camposano, R. 1990. The high level synthesis of digital systems. Proc IEEE 78, 301--318.Google ScholarGoogle ScholarCross RefCross Ref
  42. MicroBlaze. 2011. http://www.xilinx.com/tools/microblaze.htm.Google ScholarGoogle Scholar
  43. Miller, B., Givargis, T., and Vahid, F. 2011. Application-specific codesign platform generation for digital mockups in cyber-physical systems. In Proceedings of the IEEE Electronic System Level Synthesis Conference (ESLsyn'11).Google ScholarGoogle Scholar
  44. Mosegaard, J. and Sørensen, T. S. 2005. Real-time deformation of detailed geometry based on mappings to a less detailed physical simulation on the GPU. In Proceedings of the Eurographics Virtual Environments Workshop. 105--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Motuk, E., Woods, R., and Bilbao, S. 2005. Implementation of finite difference schemes for the wave equation on FPGA. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05).Google ScholarGoogle Scholar
  46. Najjar, W., Bohm, W., Draper, B., Hammes, J., Rinker, R., Beveridge, R., Chawathe, M., and Ross, C. 2003. From algorithms to hardware - A high-level language abstraction for reconfigurable computing. Comput. 36, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. National Instruments. 2011. LabView FPGA module. http://www.ni.com/fpga/.Google ScholarGoogle Scholar
  48. Nsr Physiome Project. 2011. Mathematical markup language. http://nsr.bioeng.washington.edu/jsim/docs/MML_Intro.html.Google ScholarGoogle Scholar
  49. Nvidia Corporation. 2011. http://www.nvidia.com/object/gpu.html.Google ScholarGoogle Scholar
  50. Osana, Y., Fukushima, T., and Amano, H. 2004. ReCSiP: A reconfigurable cell simulation platform: Accelerating biological applications with FPGA. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Pimentel, J. and Tirat-Gefen, Y. 2006. Hardware acceleration for real time simulation of physiological systems. In Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS'06). 218--223.Google ScholarGoogle Scholar
  52. Reshadi, M., Gorjiara, B., and Gajski, D. 2005. Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths. In Proceedings of the International Conference on Computer Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Salwinski, L. and Eisenberg, D. 2004. Silico Simulation of Biological Network Dynamics. Nature Publishing Group, 1017--1019.Google ScholarGoogle Scholar
  54. Simulink. 2001. http://www.mathworks.com/products/simulink/.Google ScholarGoogle Scholar
  55. Spark Project. 2005. http://mesl.ucsd.edu/spark/.Google ScholarGoogle Scholar
  56. Synphonyc. 2011. http://www.synopsys.com/Systems/BlockDesign/HLS/Pages/SynphonyCCompiler.Google ScholarGoogle Scholar
  57. VHDL. 2011. http://www.vhdl.org.Google ScholarGoogle Scholar
  58. Weibel, E. R. 1963. Morphometry of the Human Lung. Springer.Google ScholarGoogle Scholar
  59. Xilinx Ise. 2011. http://www.xilinx.com/support/documentation/dt_ise12-4.htm.Google ScholarGoogle Scholar
  60. Yoshimi, M., Osana, Y., Fukushima, T., and Amano, H. 2004. Stochastic simulation for biochemical reactions on FPGA. In Proceedings of the 14th International Conference on Field Programmable Logic and Application (FPL'04). 05--114.Google ScholarGoogle Scholar
  61. Zhang, H., Holden, A. V., and Boyett, M. R. 2001. Gradient model versus mosaic model of the sinoatrial node. Circulat. 103, 584--588.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Automatic synthesis of physical system differential equation models to a custom network of general processing elements on FPGAs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!