Abstract
Finite-difference methods are computationally intensive and required by many applications. Parameters of a finite-difference algorithm, such as grid size, can be varied to generate design space which contains algorithm instances with different constant coefficients. An algorithm instance with specific coefficients can either be mapped into general operators to construct static designs, or be implemented as constant-specific operators to form dynamic designs, which require runtime reconfiguration to update algorithm coefficients. This article proposes a tuning method to explore the design space to optimise both the static and the dynamic designs, and an evaluation method to select the design with maximum overall throughput, based on algorithm characteristics, design properties, available resources and runtime data size. For benchmark applications option pricing and Reverse-Time Migration (RTM), over 50% reduction in resource consumption has been achieved for both static designs and dynamic designs, while meeting precision requirements. For a single hardware implementation, the RTM design optimised with the proposed approach is expected to run 1.8 times faster than the best published design. The tuned static designs run thousands of times faster than the dynamic designs for algorithms with small data size, while the tuned dynamic designs achieve up to 5.9 times speedup over the corresponding static designs for large-scale finite-difference algorithms.
- M. Araya-Polo, J. Cabezas, M. Hanzich et al. 2011. Assessing accelerator-based HPC reverse time migration. IEEE Trans. Parallel Distrib. Syst. 22, 147--162. Google Scholar
Digital Library
- S. Banescu, F. De Dinechin, B. Pasca, and R. Tudoran. 2010. Multipliers for floating-point double precision and beyond on FPGAs. SIGARCH Comput. Archit. News 38, 4, 73--79. Google Scholar
Digital Library
- T. Becker, Q. Jin, W. Luk, and S. Weston. 2011. Dynamic constant reconfiguration for explicit finite difference option pricing. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs. Google Scholar
Digital Library
- K. Bruneel, F. Abouelella, and D. Stroobandt. 2009. Automatically mapping applications to a selfreconfiguring platform. In Proceedings of the Conference and Exhibition on Design, automation and Test in Europe. Google Scholar
Digital Library
- J. G. Charney, R. Fjortoft, and J. Von Neumann. 1950. Numerical integration of the barotropic vorticity equation. Tellus 2, 237--254.Google Scholar
Cross Ref
- K. Datta, M. Murphy, V. S. Volkov, W. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing. Google Scholar
Digital Library
- F. de Dinechin and B. Pasca. 2009. Large multipliers with fewer DSP blocks. In Proceedings of the International Conference on Field Programmable Logic and Applications.Google Scholar
- F. de Dinechin and B. Pasca. 2011. Designing custom arithmetic data paths with FLOPOCO. IEEE Des. Test Comput. 28, 4, 18--27. Google Scholar
Digital Library
- F. Duhem, F. Muller, and P. Lorenzini. 2012. Reconfiguration time overhead on field programmable gate arrays: reduction and cost model. IET Comput. Digital Tech. 6, 2, 105--113.Google Scholar
Cross Ref
- E. El-Araby, I. Gonzalez, and T. El-Ghazawi. 2009. Exploiting partial runtime reconfiguration for high-performance reconfigurable computing. ACM Trans. Reconfigurable Technol. Syst. 1. Google Scholar
Digital Library
- H. Fu and R. G. Clapp. 2011. Eliminating the memory bottleneck: an Conference on Field-Programmable Gate Arrays-based solution for 3d reverse time migration. In Proceedings of the Conference on Field-Programmable Gate Arrays. Google Scholar
Digital Library
- J. Hull. 2005. Options, Futures and Other Derivatives 6th Ed. Prentice Hall.Google Scholar
- Y. Iskander, S. Craven, A. Chandrasekharan, S. Rajagopalan, G. Subbarayan, T. Frangieh, and C. Patterson. 2010. Using partial reconfiguration and high-level models to accelerate FPGA design validation. In Proceedings of the International Conference on Field Programmable Technology.Google Scholar
- Q. Jin, T. Becker, W. Luk, and D. B. Thomas. 2012. Optimising explicit finite difference option pricing for dynamic constant reconfiguration. In Proceedings of the International Conference on Field Programmable Logic and Applications.Google Scholar
- T. Kobori, T. Maruyama, and T. Hoshino. 2001. A cellular automata system with fpga. In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines. 120--129. Google Scholar
Digital Library
- D. Koch and J. Torresen. 2011. A high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting. In Proceedings of the Conference on Field-Programmable Gate Arrays. Google Scholar
Digital Library
- L. Lu and K. Magerlein. 2013. Multi-level parallel computing of reverse time migration for seismic imaging on blue gene/q. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 291--292. Google Scholar
Digital Library
- P. Micikevicius. 2009. 3D finite difference computation on GPUs using CUDA. In Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units. 79--84. Google Scholar
Digital Library
- F. Nava, D. Sciuto, M. D. Santambrogio, S. Herbrechtsmeier, M. Porrmann, U. Witkowski, and U. Rueckert. 2010. Applying dynamic reconfiguration in the mobile robotics domain: A case study on computer vision algorithms. ACM Trans. Reconfigurable Technol. Syst. 4, 29. Google Scholar
Digital Library
- X. Niu, T. C. P. Chau, Q. Jin, W. Luk, and Q. Liu. 2013a. Automating elimination of idle functions by run-time reconfiguration. In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines. Google Scholar
Digital Library
- X. Niu, J. G. F. Coutinho, Y. Wang, and W. Luk. 2013b. Dynamic stencil: Effective exploitation of run-time resources in reconfigurable clusters. In Proceedings of the International Conference on Field Programmable Technology. 214--221.Google Scholar
- X. Niu, Q. Jin, W. Luk, Q. Liu, and O. Pell. 2012. Exploiting run-time reconfiguration in stencil computation. In Proceedings of the International Conference on Field Programmable Logic and Applications. 173--180.Google Scholar
- O. Pell, J. A. Bower, R. G. Dimond, O. Mencer, and M. J. Flynn. 2013. Finite-difference wave propagation modeling on special-purpose dataflow machines. IEEE Trans. Parallel Distrib. Syst. 24, 5, 906--915. Google Scholar
Digital Library
- M. Perrone, L.-K. Liu, L. Lu, K. Magerlein, C. Kim, I. Fedulova, and A. Semenikhin. 2012. Reducing data movement costs: Scalable seismic imaging on blue gene. In Proceedings of the International Parallel and Distributed Processing Symposium. 320--329. Google Scholar
Digital Library
- E. Phillips and M. Fatica. 2010. Implementing the himeno benchmark with CUDA on GPU clusters. In Proceedings of the International Parallel and Distributed Processing Symposium.Google Scholar
- G. W. Reitwiesner. 1960. Binary arithmetic. Advances Computers 1, 261--265.Google Scholar
- K. Sano, Y. Hatsuda, and S. Yamamoto. 2011. Scalable streaming-array of simple soft-processors for stencil computations with constant memory-bandwidth. In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines. Google Scholar
Digital Library
- L. Singhal and E. Bozorgzadeh. 2006. Multi-layer floorplanning on a sequence of reconfigurable designs. In Proceedings of the International Conference on Field Programmable Logic and Applications.Google Scholar
- R. H. Turner and R. F. Woods. 2004. Highly efficient, limited range multipliers for lut-based FPGA architectures. IEEE Trans. VLSI Syst. 12, 10, 1113--1118. Google Scholar
Digital Library
Index Terms
A Self-Aware Tuning and Self-Aware Evaluation Method for Finite-Difference Applications in Reconfigurable Systems
Recommendations
Design Tools for Implementing Self-Aware and Fault-Tolerant Systems on FPGAs
To fully exploit the capabilities of runtime reconfigurable FPGAs in self-aware systems, design tools are required that exceed the capabilities of present vendor design tools. Such tools must allow the implementation of scalable reconfigurable systems ...
Implementation of secure applications in self-reconfigurable systems
In a highly connected World, network security is a must even for embedded systems. However, cryptographic algorithms are computationally intensive and the processors used in FPGA-based embedded systems are known to have a modest performance. In fact, ...
Design Assurance Strategy and Toolset for Partially Reconfigurable FPGA Systems
The growth of the Reconfigurable Computing (RC) systems community exposes diverse requirements with regard to functionality of Electronic Design Automation (EDA) tools. Low-level design tools are increasingly required for RC bitstream debugging and IP ...






Comments