Abstract
The ease-of-use and reconfigurability of FPGAs makes them an attractive platform for accelerating algorithms. However, accelerating becomes a challenging task as the large number of possible design parameters lead to different accelerator variants. In this article, we propose techniques for fast design exploration and multi-objective optimization to quickly identify both algorithmic and hardware parameters that optimize these accelerators. This information is used to run regression analysis and train mathematical models within a nonlinear optimization framework to identify the optimal algorithm and design parameters under various objectives and constraints. To automate and improve the model generation process, we propose the use of L1-regularized least squares regression techniques.We implement two real-time image processing accelerators as test cases: one for image deblurring and one for block matching. For these designs, we demonstrate that by sampling only a small fraction of the design space (0.42% and 1.1%), our modeling techniques are accurate within 2%--4% for area and throughput, 8%--9% for power, and 5%--6% for arithmetic accuracy. We show speedups of 340× and 90× in time for the test cases compared to brute-force enumeration. We also identify the optimal set of parameters for a number of scenarios (e.g., minimizing power under arithmetic inaccuracy bounds).
- Giuseppe Ascia, Vincenzo Catania, and Maurizi Palesi. 2002. A framework for design space exploration of parameterized VLSI systems. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’02). IEEE Computer Society, Los Alamitos, CA, 245--250. http://dl.acm.org/citation.cfm?id=832284.835448. Google Scholar
Digital Library
- Richard H. Byrd, Jean Charles Gilbert, and Jorge Nocedal. 1996. A trust region method based on interior point techniques for nonlinear programming. Math. Prog. 89, 149--185. Google Scholar
Digital Library
- B. Carrion Schafer and K. Wakabayashi. 2012. Machine learning predictive modelling high-level synthesis design space exploration. IET Comput. Digit. Tech. 6, 3, 153--159. DOI:http://dx.doi.org/10.1049/iet-cdt.2011.0115.Google Scholar
Cross Ref
- Deming Chen, Jason Cong, Yiping Fan, and Zhiru Zhang. 2007. High-level power estimation and low-power design space exploration for FPGAs. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’07). IEEE Computer Society, Los Alamitos, CA, 529--534. DOI:http://dx.doi.org/10.1109/ASPDAC.2007.358040. Google Scholar
Digital Library
- J. Das, S. J. E. Wilton, P. Leong, and W. Luk. 2009. Modeling post-techmapping and post-clustering FPGA circuit depth. In Proceedings of the Field Programmable Logic and Applications (FPL’09). 205--211. DOI:http://dx.doi.org/10.1109/FPL.2009.5272315. Google Scholar
Digital Library
- Anders Forsgren, Philip E. Gill, and Margaret H. Wright. 2002. Interior methods for nonlinear optimization. SIAM Rev. 44, 4, 525--597. Google Scholar
Digital Library
- T. Givargis, F. Vahid, and J. Henkel. 2001. System-level exploration for Pareto-optimal configurations in parameterized systems-on-a-chip. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD’01). 25--30. DOI:http://dx.doi.org/10.1109/ICCAD.2001.968593. Google Scholar
Digital Library
- Ali Irturk, Bridget Benson, Shahnam Mirzaei, and Ryan Kastner. 2010. GUSTO: An automatic generation and optimization tool for matrix inversion architectures. ACM Trans. Embed. Comput. Syst. 9, 4, Article 32. DOI:http://dx.doi.org/10.1145/1721695.1721698. Google Scholar
Digital Library
- Tianyi Jiang, Xiaoyong Tang, and Prith Banerjee. 2004. Macro-models for high level area and power estimation on FPGAs. In Proceedings of the 14th ACM Great Lakes Symposium on VLSI (GLSVLSI’04). ACM, New York, 162--165. DOI:http://dx.doi.org/10.1145/988952.988992. Google Scholar
Digital Library
- C. Kalaycioglu, O. Ulusel, and I. Hamzaoglu. 2009. Low power techniques for motion estimation hardware. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’09). 180--185. DOI:http://dx.doi.org/10.1109/FPL.2009.5272508.Google Scholar
- Braislav Kisacanin, Shuvra S. Bhattacharyya, and Sek Chai. 2009. Embedded Computer Vision. Springer London. DOI:http://dx.doi.org/10.1007/978-1-84800-304-0. Google Scholar
Digital Library
- Kwangmoo Koh, Seung-Jean Kim, and Stephen Boyd. 2007. An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. 8, 1519--1555. http://dl.acm.org/citation.cfm?id=1314498.1314550. Google Scholar
Digital Library
- Ron Kohavi and others. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Vol. 14, 1137--1145. Google Scholar
Digital Library
- Benjamin C. Lee and David M. Brooks. 2006. Accurate and efficient regression modeling for microarchitectural performance and power prediction. SIGOPS Oper. Syst. Rev. 40, 5, 185--194. DOI:http://dx.doi.org/10.1145/1168917.1168881. Google Scholar
Digital Library
- B. C. Lee and D. Brooks. 2008. Roughness of microarchitectural design topologies and its implications for optimization. In Proceedings of the IEEE 14th International Symposium on High Performance Computer Architecture (HPCA’08). IEEE, 240--251. DOI:http://dx.doi.org/10.1109/HPCA.2008.4658643.Google Scholar
- K. Nepal, O. Ulusel, R. I. Bahar, and S. Reda. 2012. Fast multi-objective algorithmic design co-exploration for FPGA-based accelerators. In Proceedings of the IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 65--68. DOI:http://dx.doi.org/10.1109/FCCM.2012.21. Google Scholar
Digital Library
- Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. 2005. Multi-objective design space exploration of embedded systems. J. Embedded Comput. 1, 3, 305--316. http://dl.acm.org/citation.cfm?id=1233748.1233750. Google Scholar
Digital Library
- D. Rossi, C. Mucci, F. Campi, S. Spolzino, L. Vanzolini, H. Sahlbach, S. Whitty, R. Ernst, W. Putzke-Roming, and R. Guerrieri. 2013. Application space exploration of a heterogeneous run-time configurable digital signal processor. IEEE Trans. VLSI Syst. 21, 2, 193--205. DOI:http://dx.doi.org/10.1109/TVLSI.2012.2185963. Google Scholar
Digital Library
- David Sheldon and Frank Vahid. 2009. Making good points: Application-specific pareto-point generation for design space exploration using statistical methods. In Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’09). ACM, New York, 123--132. DOI:http://dx.doi.org/10.1145/1508128.1508149. Google Scholar
Digital Library
- Lee Chee Sing and Ha Yajun. 2005. Design space exploration for arbitrary FPGA architectures. In Proceedings of the 2nd International Conference on Embedded Software and Systems (ICESS’05). IEEE Computer Society, Los Alamitos, CA, 269--275. DOI:http://dx.doi.org/10.1109/ICESS.2005.46. Google Scholar
Digital Library
- Alastair M. Smith, Steven J. E. Wilton, and Joydip Das. 2009. Wirelength modeling for homogeneous and heterogeneous FPGA architectural development. In Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’09). ACM, New York, 181--190. DOI:http://dx.doi.org/10.1145/1508128.1508156. Google Scholar
Digital Library
- Byoungro So, Mary W. Hall, and Pedro C. Diniz. 2002. A compiler approach to fast hardware design space exploration in FPGA-based systems. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’02). ACM, New York, 165--176. DOI:http://dx.doi.org/10.1145/512529.512550. Google Scholar
Digital Library
- Kuen Hung Tsoi and Wayne Luk. 2011. Power profiling and optimization for heterogeneous multi-core systems. SIGARCH Comput. Archit. News 39, 4, 8--13. DOI:http://dx.doi.org/10.1145/2082156.2082159. Google Scholar
Digital Library
- Xilinx. 2011. ML605 Hardware User Guide. Xilinx.Google Scholar
Index Terms
Fast Design Exploration for Performance, Power and Accuracy Tradeoffs in FPGA-Based Accelerators
Recommendations
Fast Multi-Objective Algorithmic Design Co-Exploration for FPGA-based Accelerators
FCCM '12: Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing MachinesThe reconfigurability of Field Programmable Gate Arrays (FPGAs) makes them an attractive platform for accelerating algorithms. Accelerating a particular algorithm is a challenging task as the large number of possible algorithmic and hardware design ...
Exploration and Customization of FPGA-Based Soft Processors
As embedded systems designers increasingly use field-programmable gate arrays (FPGAs) while pursuing single-chip designs, they are motivated to have their designs also include soft processors, processors built using FPGA programmable logic. In this ...
Implementing high-performance, low-power FPGA-based optical flow accelerators in C
ASAP '13: Proceedings of the 2013 IEEE 24th International Conference on Application-specific Systems, Architectures and Processors (ASAP)Recent developments in High-Level Synthesis (HLS) for FPGAs are making it possible to “run” C code on FPGAs thereby making modern programming environments available to FPGA developers. In this paper, C code for a complex optical-flow algorithm is ...








Comments