Abstract
High-performance reconfigurable computing involves acceleration of significant portions of an application using reconfigurable hardware. When the hardware tasks of an application cannot simultaneously fit in an FPGA, the task graph needs to be partitioned and scheduled into multiple FPGA configurations, in a way that minimizes the total execution time. This article proposes the Reduced Data Movement Scheduling (RDMS) algorithm that aims to improve the overall performance of hardware tasks by taking into account the reconfiguration time, data dependency between tasks, intertask communication as well as task resource utilization. The proposed algorithm uses the dynamic programming method. A mathematical analysis of the algorithm shows that the execution time would at most exceed the optimal solution by a factor of around 1.6, in the worst-case. Simulations on randomly generated task graphs indicate that RDMS algorithm can reduce interconfiguration communication time by 11% and 44% respectively, compared with two other approaches that consider data dependency and hardware resource utilization only. The practicality, as well as efficiency of the proposed algorithm over other approaches, is demonstrated by simulating a task graph from a real-life application - N-body simulation - along with constraints for bandwidth and FPGA parameters from existing high-performance reconfigurable computers. Experiments on SRC-6 are carried out to validate the approach.
- Bazargan, K., Kastner, R., and Sarrafzadeh, M. 2000. Fast template placement for reconfigurable computing systems. IEEE Des. Test Comput. 17, 1, 68--83. Google Scholar
Digital Library
- Brebner, G. and Diessel, O. 2001. Chip-based reconfigurable task management. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’01). 182--191. Google Scholar
Digital Library
- Caprara, A. and Pferschy, U. 2004. Worst-case analysis of the subset sum algorithm for bin packing. Oper. Res. Lett. 32, 20, 159--166. Google Scholar
Digital Library
- Coffman, Jr., E. G., Garey, M. R., and Johnson, D. S. 1996. Approximation algorithms for bin packing: a survey. In Approximation Algorithms for NP-Hard Problems. D. Hochbaum Ed., PWS Publishing, Boston. 46--93. Google Scholar
Digital Library
- Compton, K., Li, Z., Cooley, J., Knol, S., and Hauck, S. 2002. Configuration relocation and defragmentation for run-time reconfigurable computing. IEEE Trans. VLSI Syst. 10, 3, 209--220. Google Scholar
Digital Library
- Diessel, O., ElGindy, H., Middendorf, M., Schmeck, H., and Schmidt, B. 2000. Dynamic scheduling of tasks on partially reconfigurable FPGAs. IEE Proc. Comput. Digital Techniq. (Special Issue on Reconfigurable Systems) 147, 3, 181--188.Google Scholar
Cross Ref
- Fekete, S. P., Köhler, E., and Teich, J. 2001. Optimal FPGA module placement with temporal precedence constraints. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’01). 658--665. Google Scholar
Digital Library
- Govindu, G., Scrofano, R., and Prasanna, V. K. 2005. A library of parameterizable floating-point cores for FPGAs and their application to scientific computing. In Proceedings of the International Conference on Engineering Reconfigurable Systems and Algorithms (ERSA’05). 137--145.Google Scholar
- Handa, M. and Vemuri, R. 2004. A fast algorithm for finding maximal empty rectangles for dynamic FPGA placement. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04). Vol. 1. 744--745. Google Scholar
Digital Library
- Hemmert, K. S. and Underwood, K. D. 2006. Open source high performance floating-point modules. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). 349--350. Google Scholar
Digital Library
- Huang, M., Simmler, H., Saha, P., and El-Ghazawi, T. 2008. Hardware task scheduling optimizations for reconfigurable computing. In Proceedings of the 2nd International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA’08).Google Scholar
- Huang, M., Simmler, H., Serres, O., and El-Ghazawi, T. 2009. RDMS: A hardware task scheduling algorithm for reconfigurable computing. In Proceedings of the 16th Reconfigurable Architectures Workshop (RAW’09). Google Scholar
Digital Library
- Kellerer, H., Pferschy, U., and Pisinger, D. 2004. Knapsack Problems. Springer, Berlin.Google Scholar
- Kleinberg, J. and Tardos, É. 2005. Algorithm Design. Pearson/Addison-Wesley, Boston, MA. Google Scholar
Digital Library
- Lienhart, G., Kugel, A., and Männer, R. 2002. Using floating-point arithmetic on FPGAs to accelerate scientific N-body simulations. In Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02). 182--191. Google Scholar
Digital Library
- Lucy, L. B. 1977. A numerical approach to the testing of the fission hypothesis. Astronom. J. 82, 12, 1013--1024.Google Scholar
Cross Ref
- Monaghan, J. J. and Lattanzio, J. C. 1985. A refined particle method for astrophysical problems. Astron. Astrophys. 149, 135--143.Google Scholar
- Pisinger, D. 1999. Linear time algorithms for knapsack problems with bounded weights. J. Algor. 33, 1, 1--14. Google Scholar
Digital Library
- Saha, P. 2007. Automatic software hardware co-design for reconfigurable computing systems. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’07). 507--508.Google Scholar
Cross Ref
- Thakkar, A. J. and Ejnioui, A. 2006. Design and implementation of double precision floating point division and square root on FPGAs. In Proceedings of the IEEE Aerospace Conference.Google Scholar
- Walder, H. and Platzner, M. 2002. Non-preemptive multitasking on fpga: Task placement and footprint transform. In Proceedings of the 2nd International Conference on Engineering of Reconfigurable Systems and Architectures (ERSA). 24--30.Google Scholar
- Walder, H., Steiger, C., and Platzner, M. 2003. Fast online task placement on FPGAs: free space partitioning and 2D-hashing. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’03). 178--185. Google Scholar
Digital Library
- Wiangtong, T., Cheung, P., and Luk, W. 2003. Multitasking in hardware-software codesign for reconfigurable computer. In Proceedings of the International Symposium on Circuits and Systems (ISCAS’03). Vol. 5. 621--624.Google Scholar
- Zhuo, L. and Prasanna, V. K. 2007. Scalable and modular algorithms for floating-point matrix multiplication on reconfigurable computing systems. IEEE Trans. Para. Distrib. Syst. 18, 4, 433--448. Google Scholar
Digital Library
Index Terms
Reconfiguration and Communication-Aware Task Scheduling for High-Performance Reconfigurable Computing
Recommendations
Exploiting Partial Runtime Reconfiguration for High-Performance Reconfigurable Computing
Runtime Reconfiguration (RTR) has been traditionally utilized as a means for exploiting the flexibility of High-Performance Reconfigurable Computers (HPRCs). However, the RTR feature comes with the cost of high configuration overhead which might ...
Configuration locking and schedulability estimation for reduced reconfiguration overheads of reconfigurable systems
Dynamically reconfigurable field-programmable gate arrays (FPGAs) hold the promise of providing a virtual hardware resource in which hardware circuits can be dynamically scheduled onto the available FPGA resources. However, reconfiguring an FPGA can ...
Performance bounds of partial run-time reconfiguration in high-performance reconfigurable computing
HPRCTA '07: Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07High-Performance Reconfigurable Computing (HPRC) systems have always been characterized by their high performance and flexibility. Flexibility has been traditionally exploited through the Run-Time Reconfiguration (RTR) provided by most of the available ...






Comments