skip to main content
research-article

An analytical model for multilevel performance prediction of Multi-FPGA systems

Published:22 August 2011Publication History
Skip Abstract Section

Abstract

Power limitations in semiconductors have made explicitly parallel device architectures such as Field-Programmable Gate Arrays (FPGAs) increasingly attractive for use in scalable systems. However, mitigating the significant cost of FPGA development requires efficient design-space exploration to plan and evaluate a range of potential algorithm and platform choices prior to implementation. The authors propose the RC Amenability Test for Scalable Systems (RATSS), an analytical model which enables straightforward, fast, and reasonably accurate performance prediction prior to implementation by extending current modeling concepts to multi-FPGA designs. RATSS provides a comprehensive strategic model to evaluate applications based on the computation and communication requirements of the algorithm and capabilities of the FPGA platform. The RATSS model targets data-parallel applications on current scalable FPGA systems. Three case studies with RATSS demonstrate nearly 90% prediction accuracy as compared to corresponding implementations.

References

  1. Agility Design Solutions. 2007. Handel-C Language Reference Manual. Agility Design Solutions, http://www.agilityds.com/literature/HandelC_Language_Reference_Manual.pdf.Google ScholarGoogle Scholar
  2. Alexandrov, A., Ionescu, M. F., Schauser, K. E., and Scheiman, C. 1997. LogGP: Incorporating long messages into the LogP model for parallel computation. J. Paral. Distrib. Comput. 44, 1, 71--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Allen, M. P. and Tildesley, D. J. 1987. Computer Simulation of Liquids. Oxford University Press, Oxford, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Balarin, F., Watanabe, Y., Hsieh, H., Lavagno, L., Passerone, C., and Sangiovanni-Vincentelli, A. 2003. Metropolis: An integrated electronic system design environment. Comput. 36, 4, 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bednara, M. and Teich, J. 2001. Synthesis of FPGA implementations from loop algorithms. In Proceedings of the 1st International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA). 1--7.Google ScholarGoogle Scholar
  6. Bhat, P. B., Prasanna, V. K., and Raghavendra, C. S. 1999. Adaptive communication algorithms for distributed heterogeneous systems. J. Parall. Distrib. Comput. 59, 2, 252--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bondalapati, K. K. 2001. Modeling and mapping for dynamically reconfigurable hybrid architectures. Ph.D. thesis, University of Southern California, Los Angeles, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bosque, J. L. and Pastor, L. 2006. A parallel computation model for heterogenous clusters. IEEE Trans. Paral. Distrib. Syst. 17, 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bosque, J. L. and Perez, L. P. 2004. HLogGP: A new parallel computational model for heterogeneous clusters. In Proceedings of the IEEE Symposium on Cluster Computing and the Grid. 403--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Buck, J., Ha, S., Lee, E. A., and Messerschmitt, D. G. 1994. Ptolemy: A framework for simulating and prototyping heterogeneous systems. Int. J. Comp. Simul. 4, 152--184.Google ScholarGoogle Scholar
  11. Cappello, F., Fraigniaud, P., Mans, B., and Rosenberg, A. L. 2001. HiHCoHP: Toward a realistic communication model for hierarchical hyperclusters of heterogeneous processors. In Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS). IEEE Computer Society, Los Alamitos, CA, 42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K. E., Santos, E., Subramonian, R., and von Eicken, T. 1993. LogP: Towards a realistic model of parallel computation. In Proceeding of the 4th ACM Symposium on Principles and Practice of Parallel Programming. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. DeHon, A., Adams, J., DeLorimier, M., Kapre, N., Matsuda, Y., Naeimi, H., Vanier, M., and Wrighton, M. 2004. Design patterns for reconfigurable computing. In Proceeding of the 12th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., and Buell, D. 2008. The promise of high-performance reconfigurable computing. Comput. 41, 2, 69--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Enzler, R., Jeger, T., Cottet, D., and Tröster, G. 2000. High-Level area and performance estimation of hardware building blocks on fpgas. In Proceedings of the 10th International Workshop on Field-Programmable Logic and Applications. Springer, 525--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Enzler, R., Plessl, C., and Platzner, M. 2005. System-Level performance evaluation of reconfigurable processors. Microprocess. Microsyst. 29, 2-3, 63--75.Google ScholarGoogle ScholarCross RefCross Ref
  17. Fortune, S. and Wyllie, J. 1978. Parallelism in random access machines. In Proceedings of the 10th ACM Symposium on Theory of Computing. 114--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Frank, M. I., Agarwal, A., and Vernon, M. K. 1997. LoPC: Modeling contention in parallel algorithms. In Proceedings of the 6th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP). ACM, 276--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fu, W. and Compton, K. 2006. A simulation platform for reconfigurable computing research. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). 1--7.Google ScholarGoogle Scholar
  20. Gonzalez, R. C. and Woods, R. E. 2002. Digital Image Processing, 2nd ed. Prentice-Hall, Upper Saddle River, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Grobelny, E., Bueno, D., Troxel, I., George, A., and Vetter, J. 2007. FASE: A framework for scalable performance prediction of hpc systems and applications. Simul. Trans. Soc. Model. Simul. Int. 83, 10, 721--745. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Herbordt, M. C., VanCourt, T., Gu, Y., Sukhwani, B., Conti, A., Model, J., and DiSabello, D. 2007. Achieving high performance with FPGA-based computing. IEEE Comput. 40, 3, 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Holland, B., Nagarajan, K., and George, A. D. 2009. RAT: RC amenability test for rapid performance prediction. ACM Trans. Reconfig. Tech. Syst. 1, 4, 22:1--22:31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jacobs, A., Conger, C., and George, A. D. 2008. Multiparadigm space processing for hyperspectral imaging. In Proceedings of the IEEE Aerospace Conference.Google ScholarGoogle Scholar
  25. Kaul, M., Vemuri, R., Govindarajan, S., and Ouaiss, I. 1999. An automated temporal partitioning and loop fission approach for FPGA based reconfigurable synthesis of DSP applications. In Proceedings of the 36th ACM/IEEE Design Automation Conference (DAC). ACM, New York, 616--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kesavan, R., Bondalapati, K., Panda, D., and P, D. K. 1997. Multicast on irregular switch-based networks with wormhole routing. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 48--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kielmann, T., Bal, H. E., and Gorlatch, S. 1999. Bandwidth-Efficient collective communication for clustered wide area systems. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS). 492--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kielmann, T., Bal, H. E., and Verstoep, K. 2000. Fast measurement of LogP parameters for message passing platforms. In Proceedings of the 15th IPDPS Workshop on Parallel and Distributed Processing. 1176--1183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lastovetsky, A., Mkwawa, I.-H., and O'Flynn, M. 2006. An accurate communication model of a heterogenous cluster based on a switch-enabled ethernet network. In Proceedings of the 12th IEEE International Conference on Parallel and Distributed Systems (ICPADS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mitrionics. 2008. Low power hybrid computing for efficient software acceleration. http://www.mitrion.com/?document=Hybrid-Computing-Whitepaper.pdf.Google ScholarGoogle Scholar
  31. Nagarajan, K., Holland, B., George, A., Slatton, K. C., and Lam, H. 2009. Accelerating machine-learning algorithms on FPGAs using pattern-based decomposition. J. Sig. Process. Syst. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nelson, M., Humphrey, W., Gursoy, A., Dalke, A., Kalé, L., Skeel, R. D., and Schulten, K. 1996. NAMD - A parallel, object-oriented molecular dynamics program. Int. J. Supercomp. App. High Perform. Comput. 10, 4, 251--268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Parzen, E. 1962. On estimation of a probability density function and mode. Ann. Math. Statist. 33, 3, 1065--1076.Google ScholarGoogle ScholarCross RefCross Ref
  34. Pearlman, D. A., Case, D. A., Caldwell, J. W., Ross, W. S., Thomas E. Cheatham, I., DeBolt, S., Ferguson, D., Seibel, G., and Kollman, P. 1995. AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comput. Phys. Comm. 91, 1-3, 1--41.Google ScholarGoogle ScholarCross RefCross Ref
  35. Pellerin, D. and Thibault, S. 2005. Practical FPGA Programming in C. Prentice Hall Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Peterson, G. D. and Chamberlain, R. D. 1994. Beyond execution time: Expanding the use of performance models. IEEE Concurr. 2, 37--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Pimentel, A. D., Hertzbetger, L. O., Lieverse, P., van der Wolf, P., and Deprettere, E. F. 2001. Exploring embedded-systems architectures with artemis. Comput. 34, 11, 57--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Quinn, H., Leeser, M., and King, L. S. 2007. Dynamo: A runtime partitioning system for FPGA-based HW/SW image processing systems. J. Real-Time Image Process. 2, 4, 179--190.Google ScholarGoogle ScholarCross RefCross Ref
  39. Reardon, C., Grobelny, E., George, A., and Wang, G. 2009. A simulation framework for rapid analysis of reconfigurable computing systems. ACM Trans. Reconfig. Tech. Syst. to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Smith, M. and Peterson, G. 2005. Parallel application performance on shared high performance reconfigurable computing resources. Perform. Eval. 60, 107--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. SRC Computers 2007. SRC Carte C Programming Environment. SRC Computers.Google ScholarGoogle Scholar
  42. Steffen, C. 2007. Parameterization of algorithms and FPGA accelerators to predict performance. In Reconfigurable System Summer Institute (RSSI).Google ScholarGoogle Scholar
  43. Valiant, L. G. 1990. A bridging model for parallel computation. Comm. ACM 33, 8, 103--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Wolf, W. 2003. A decade of hardware/software codesign. Comput. 36, 4, 38--43. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An analytical model for multilevel performance prediction of Multi-FPGA systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!