Abstract
Power limitations in semiconductors have made explicitly parallel device architectures such as Field-Programmable Gate Arrays (FPGAs) increasingly attractive for use in scalable systems. However, mitigating the significant cost of FPGA development requires efficient design-space exploration to plan and evaluate a range of potential algorithm and platform choices prior to implementation. The authors propose the RC Amenability Test for Scalable Systems (RATSS), an analytical model which enables straightforward, fast, and reasonably accurate performance prediction prior to implementation by extending current modeling concepts to multi-FPGA designs. RATSS provides a comprehensive strategic model to evaluate applications based on the computation and communication requirements of the algorithm and capabilities of the FPGA platform. The RATSS model targets data-parallel applications on current scalable FPGA systems. Three case studies with RATSS demonstrate nearly 90% prediction accuracy as compared to corresponding implementations.
- Agility Design Solutions. 2007. Handel-C Language Reference Manual. Agility Design Solutions, http://www.agilityds.com/literature/HandelC_Language_Reference_Manual.pdf.Google Scholar
- Alexandrov, A., Ionescu, M. F., Schauser, K. E., and Scheiman, C. 1997. LogGP: Incorporating long messages into the LogP model for parallel computation. J. Paral. Distrib. Comput. 44, 1, 71--79. Google Scholar
Digital Library
- Allen, M. P. and Tildesley, D. J. 1987. Computer Simulation of Liquids. Oxford University Press, Oxford, UK. Google Scholar
Digital Library
- Balarin, F., Watanabe, Y., Hsieh, H., Lavagno, L., Passerone, C., and Sangiovanni-Vincentelli, A. 2003. Metropolis: An integrated electronic system design environment. Comput. 36, 4, 45--52. Google Scholar
Digital Library
- Bednara, M. and Teich, J. 2001. Synthesis of FPGA implementations from loop algorithms. In Proceedings of the 1st International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA). 1--7.Google Scholar
- Bhat, P. B., Prasanna, V. K., and Raghavendra, C. S. 1999. Adaptive communication algorithms for distributed heterogeneous systems. J. Parall. Distrib. Comput. 59, 2, 252--279. Google Scholar
Digital Library
- Bondalapati, K. K. 2001. Modeling and mapping for dynamically reconfigurable hybrid architectures. Ph.D. thesis, University of Southern California, Los Angeles, CA. Google Scholar
Digital Library
- Bosque, J. L. and Pastor, L. 2006. A parallel computation model for heterogenous clusters. IEEE Trans. Paral. Distrib. Syst. 17, 13. Google Scholar
Digital Library
- Bosque, J. L. and Perez, L. P. 2004. HLogGP: A new parallel computational model for heterogeneous clusters. In Proceedings of the IEEE Symposium on Cluster Computing and the Grid. 403--410. Google Scholar
Digital Library
- Buck, J., Ha, S., Lee, E. A., and Messerschmitt, D. G. 1994. Ptolemy: A framework for simulating and prototyping heterogeneous systems. Int. J. Comp. Simul. 4, 152--184.Google Scholar
- Cappello, F., Fraigniaud, P., Mans, B., and Rosenberg, A. L. 2001. HiHCoHP: Toward a realistic communication model for hierarchical hyperclusters of heterogeneous processors. In Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS). IEEE Computer Society, Los Alamitos, CA, 42. Google Scholar
Digital Library
- Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K. E., Santos, E., Subramonian, R., and von Eicken, T. 1993. LogP: Towards a realistic model of parallel computation. In Proceeding of the 4th ACM Symposium on Principles and Practice of Parallel Programming. 1--12. Google Scholar
Digital Library
- DeHon, A., Adams, J., DeLorimier, M., Kapre, N., Matsuda, Y., Naeimi, H., Vanier, M., and Wrighton, M. 2004. Design patterns for reconfigurable computing. In Proceeding of the 12th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM). Google Scholar
Digital Library
- El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., and Buell, D. 2008. The promise of high-performance reconfigurable computing. Comput. 41, 2, 69--76. Google Scholar
Digital Library
- Enzler, R., Jeger, T., Cottet, D., and Tröster, G. 2000. High-Level area and performance estimation of hardware building blocks on fpgas. In Proceedings of the 10th International Workshop on Field-Programmable Logic and Applications. Springer, 525--534. Google Scholar
Digital Library
- Enzler, R., Plessl, C., and Platzner, M. 2005. System-Level performance evaluation of reconfigurable processors. Microprocess. Microsyst. 29, 2-3, 63--75.Google Scholar
Cross Ref
- Fortune, S. and Wyllie, J. 1978. Parallelism in random access machines. In Proceedings of the 10th ACM Symposium on Theory of Computing. 114--118. Google Scholar
Digital Library
- Frank, M. I., Agarwal, A., and Vernon, M. K. 1997. LoPC: Modeling contention in parallel algorithms. In Proceedings of the 6th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP). ACM, 276--287. Google Scholar
Digital Library
- Fu, W. and Compton, K. 2006. A simulation platform for reconfigurable computing research. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). 1--7.Google Scholar
- Gonzalez, R. C. and Woods, R. E. 2002. Digital Image Processing, 2nd ed. Prentice-Hall, Upper Saddle River, NJ. Google Scholar
Digital Library
- Grobelny, E., Bueno, D., Troxel, I., George, A., and Vetter, J. 2007. FASE: A framework for scalable performance prediction of hpc systems and applications. Simul. Trans. Soc. Model. Simul. Int. 83, 10, 721--745. Google Scholar
Digital Library
- Herbordt, M. C., VanCourt, T., Gu, Y., Sukhwani, B., Conti, A., Model, J., and DiSabello, D. 2007. Achieving high performance with FPGA-based computing. IEEE Comput. 40, 3, 50--57. Google Scholar
Digital Library
- Holland, B., Nagarajan, K., and George, A. D. 2009. RAT: RC amenability test for rapid performance prediction. ACM Trans. Reconfig. Tech. Syst. 1, 4, 22:1--22:31. Google Scholar
Digital Library
- Jacobs, A., Conger, C., and George, A. D. 2008. Multiparadigm space processing for hyperspectral imaging. In Proceedings of the IEEE Aerospace Conference.Google Scholar
- Kaul, M., Vemuri, R., Govindarajan, S., and Ouaiss, I. 1999. An automated temporal partitioning and loop fission approach for FPGA based reconfigurable synthesis of DSP applications. In Proceedings of the 36th ACM/IEEE Design Automation Conference (DAC). ACM, New York, 616--622. Google Scholar
Digital Library
- Kesavan, R., Bondalapati, K., Panda, D., and P, D. K. 1997. Multicast on irregular switch-based networks with wormhole routing. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 48--57. Google Scholar
Digital Library
- Kielmann, T., Bal, H. E., and Gorlatch, S. 1999. Bandwidth-Efficient collective communication for clustered wide area systems. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS). 492--499. Google Scholar
Digital Library
- Kielmann, T., Bal, H. E., and Verstoep, K. 2000. Fast measurement of LogP parameters for message passing platforms. In Proceedings of the 15th IPDPS Workshop on Parallel and Distributed Processing. 1176--1183. Google Scholar
Digital Library
- Lastovetsky, A., Mkwawa, I.-H., and O'Flynn, M. 2006. An accurate communication model of a heterogenous cluster based on a switch-enabled ethernet network. In Proceedings of the 12th IEEE International Conference on Parallel and Distributed Systems (ICPADS). Google Scholar
Digital Library
- Mitrionics. 2008. Low power hybrid computing for efficient software acceleration. http://www.mitrion.com/?document=Hybrid-Computing-Whitepaper.pdf.Google Scholar
- Nagarajan, K., Holland, B., George, A., Slatton, K. C., and Lam, H. 2009. Accelerating machine-learning algorithms on FPGAs using pattern-based decomposition. J. Sig. Process. Syst. Google Scholar
Digital Library
- Nelson, M., Humphrey, W., Gursoy, A., Dalke, A., Kalé, L., Skeel, R. D., and Schulten, K. 1996. NAMD - A parallel, object-oriented molecular dynamics program. Int. J. Supercomp. App. High Perform. Comput. 10, 4, 251--268.Google Scholar
Digital Library
- Parzen, E. 1962. On estimation of a probability density function and mode. Ann. Math. Statist. 33, 3, 1065--1076.Google Scholar
Cross Ref
- Pearlman, D. A., Case, D. A., Caldwell, J. W., Ross, W. S., Thomas E. Cheatham, I., DeBolt, S., Ferguson, D., Seibel, G., and Kollman, P. 1995. AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comput. Phys. Comm. 91, 1-3, 1--41.Google Scholar
Cross Ref
- Pellerin, D. and Thibault, S. 2005. Practical FPGA Programming in C. Prentice Hall Press. Google Scholar
Digital Library
- Peterson, G. D. and Chamberlain, R. D. 1994. Beyond execution time: Expanding the use of performance models. IEEE Concurr. 2, 37--49. Google Scholar
Digital Library
- Pimentel, A. D., Hertzbetger, L. O., Lieverse, P., van der Wolf, P., and Deprettere, E. F. 2001. Exploring embedded-systems architectures with artemis. Comput. 34, 11, 57--63. Google Scholar
Digital Library
- Quinn, H., Leeser, M., and King, L. S. 2007. Dynamo: A runtime partitioning system for FPGA-based HW/SW image processing systems. J. Real-Time Image Process. 2, 4, 179--190.Google Scholar
Cross Ref
- Reardon, C., Grobelny, E., George, A., and Wang, G. 2009. A simulation framework for rapid analysis of reconfigurable computing systems. ACM Trans. Reconfig. Tech. Syst. to appear. Google Scholar
Digital Library
- Smith, M. and Peterson, G. 2005. Parallel application performance on shared high performance reconfigurable computing resources. Perform. Eval. 60, 107--125. Google Scholar
Digital Library
- SRC Computers 2007. SRC Carte C Programming Environment. SRC Computers.Google Scholar
- Steffen, C. 2007. Parameterization of algorithms and FPGA accelerators to predict performance. In Reconfigurable System Summer Institute (RSSI).Google Scholar
- Valiant, L. G. 1990. A bridging model for parallel computation. Comm. ACM 33, 8, 103--111. Google Scholar
Digital Library
- Wolf, W. 2003. A decade of hardware/software codesign. Comput. 36, 4, 38--43. Google Scholar
Digital Library
Index Terms
An analytical model for multilevel performance prediction of Multi-FPGA systems
Recommendations
Design Assurance Strategy and Toolset for Partially Reconfigurable FPGA Systems
The growth of the Reconfigurable Computing (RC) systems community exposes diverse requirements with regard to functionality of Electronic Design Automation (EDA) tools. Low-level design tools are increasingly required for RC bitstream debugging and IP ...
FPGA Analysis Tool: High-Level Flows for Low-Level Design Analysis in Reconfigurable Computing
ARC '09: Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and ApplicationsThe growth of the reconfigurable systems community exposes diverse requirements with regard to functionality of Electronic Design Automation (EDA) tools. Those targeting reconfigurable design analysis and manipulation require low-level design tools for ...
High performance programmable FPGA overlay for digital signal processing
ARC'11: Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applicationsIn this paper we investigate the use of a programmable overlay to increase the performance of variable DSP workloads executing on FPGAs. The overlay approach reduces reconfiguration time and provides fast processing. The overlay was implemented on a ...






Comments