Abstract
Current high-level synthesis systems synthesize arithmetic units of a fixed known number of stages, and the scheduler mainly determines when units are activated. We focus on scheduling techniques for the high-level synthesis of pipelined arithmetic units where the number of stages of these operations is a free parameter of the synthesis. This problem is motivated by the ability to automatically create pipelined functional units, such as multipliers, with different pipe lengths. These units have different characteristics in terms of parallelism level, clock latency, frequency, etc. This article presents the Variable-length Pipeline Scheduler (VPS). The ability to synthesize variable-length pipelined units expands the known scheduling problem of high-level synthesis to include a search for a minimal number of hardware units (operations) and their desired number of stages. The proposed search procedure is based on algorithms that find a local minima in a d-dimensional grid, thus avoiding the need to evaluate all possible points in the space. We have implemented a C language compiler for VPS targeting FPGAs. Our results demonstrate that using variable-length pipeline units can reduce the overall resource usage and improve the execution time when synthesized onto an FPGA. The proposed search is sufficiently fast, taking only a few seconds, allowing an interactive mode of work. A comparison with xPilot shows a significant saving of hardware resources while maintaining comparable execution times of the resulting circuits. This work is an extension of a previous paper [Ben-Asher and Rotem 2008]
- Aldous, D. 1983. Minimization algorithms and random walk on the d-cube. Ann. Probab. 11, 2, 403--413.Google Scholar
Cross Ref
- Asher, Y. B. and Schohat, E. 2008. Finding the best compromise in compiling compound loops to verilog. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI. Google Scholar
Digital Library
- Bacon, D. F., Graham, S. L., and Sharp, O. J. 1994. Compiler transformations for high-performance computing. ACM Comput. Surv. 26, 4, 345--420. Google Scholar
Digital Library
- Ben-Asher, Y. and Meisler, D. 2006. Towards a source level compiler: Source level modulo scheduling. In Proceedings of the 5th Workshop on Compile and Runtime Techniques for Parallel Computing (CRTPC). Google Scholar
Digital Library
- Ben-Asher, Y. and Rotem, N. 2008. Synthesis for variable pipelined function units. In Proceedings of the International Symposium on System-on-Chip (SOC'08). IEEE Computer Society, 1--4.Google Scholar
- Ben-Asher, Y. and Rotem, N. 2010. Automatic memory partitioning: Increasing memory parallelism via data structure partitioning. In Proceedings of the International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS). Google Scholar
Digital Library
- Camposano, R. 1991. Path-based scheduling for synthesis. IEEE Trans. Comput. Aid. Des. 10, 1, 85--93. Google Scholar
Digital Library
- Chavet, C., Andriamisaina, C., Coussy, P., Casseau, E., Juin, E., Urard, P., and Martin, E. 2007. A design flow dedicated to multi-mode architectures for dsp applications. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'07). Google Scholar
Digital Library
- Chen, D., Cong, J., Fan, Y., Han, G., Jiang, W., and Zhang, Z. 2005. XPilot: A platform-based behavioral synthesis system. In Proceedings of the SRC TECHCON.Google Scholar
- Cong, J. and Jiang, W. 2008. Pattern-based behavior synthesis for fpga resource reduction. In Proceedings of the 16th International ACM/SIGDA Symposium on Field Programmable Gate Arrays (FPGA'08). Google Scholar
Digital Library
- Devadas, S., Ghosh, A., and Keutzer, K. 1994. Logic Synthesis. McGraw-Hill. Google Scholar
Digital Library
- Dong, Y., Zhou, J., Dou, Y., Deng, L., and Zhao, J. 2008. Impact of loop unrolling on area, throughput and clock frequency for window operations based on a data schedule method. In Proceedings of the Congress on Image and Signal Processing (CISP'08). Google Scholar
Digital Library
- Gajski, D., Dutt, N., We, A., and Lin, S. 1994. High-Level Synthesis Introduction to Chip and System Design. Kluwer Academic Publishers. Google Scholar
Digital Library
- Ghosh, A., Lodha, S. K., and Vemuri, R. 1999. Hierarchical scheduling in high level synthesis using resource sharing across nested loops. In Proceedings of the 9th Great Lakes Symposium on VLSI. 140. Google Scholar
Digital Library
- Hannig, F., Dutta, H., and Teich, J. 2009. Parallelization approaches for hardware accelerators—loop unrolling versus loop partitioning. In Proceedings of the 22nd International Conference on Architecture of Computing Systems (ARCS'09). Springer-Verlag. Google Scholar
Digital Library
- Hannig, F., Ruckdeschel, H., Dutta, H., and Teich, J. 2008. Paro: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications. In Reconfigurable Computing: Architectures, Tools and Applications, Lecture Notes in Computer Sciences, vol. 4943, Springer-Verlag, Berlin, 287--293. Google Scholar
Digital Library
- Hwang, C. T., Lee, J.-H., and Hsu, Y. C. 1991. A formal approach to the scheduling problem in high level synthesis. IEEE Trans. Comput. Aid. Desi. Integr. Circuits Syst. 10, 4, 464--475. Google Scholar
Digital Library
- Ishikawa, M. and De Micheli, G. 1991. A module selection algorithm for high-level synthesis. In Proceedings of the IEEE International Sympoisum on Circuits and Systems. IEEE, 1777--1780.Google Scholar
- Ito, K., Lucke, L., and Parhi, K. 1998. Ilp-based cost-optimal dsp synthesis with module selection and data format conversion. IEEE Trans. VLSI Syst. 6, 4, 582--594. Google Scholar
Digital Library
- Kleinberg J., and Tardos, E. 2006. Algorithm Design. Addison-Wesley. Google Scholar
Digital Library
- Kudlur, M., Fan, K., and Mahlke, S. 2006. Streamroller: Automatic synthesis of prescribed throughput accelerator pipelines. In Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'06). 270--275. Google Scholar
Digital Library
- Kurra, S., Singh, N. K., and Panda, P. R. 2007. The impact of loop unrolling on controller delay in high level synthesis. In Proceedings of the Conference on Design, Automation and test in Europe (DATE'07). Google Scholar
Digital Library
- Lam, M. 1988. Software pipelining: An effective scheduling technique for vliw machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 318--328. Google Scholar
Digital Library
- Lattner, C. and Adve, V. 2002. The LLVM instruction set and compilation strategy. Tech. rep. UIUCDCS-R-2002-2292, University of Illinois at Urbana-Champaign.Google Scholar
- Llewellyn, D. C., Tovey, C., and Trick, M. 1989. Local optimization on graphs. Discrete Appl. Math. 23, 2. Google Scholar
Digital Library
- Llosa, J. 1996. Swing modulo scheduling: A lifetime-sensitive approach. In Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT'96). IEEE Computer Society. Google Scholar
Digital Library
- Maheshwari, N. and Sapatnekar, S. 1998. Efficient retiming of large circuits. IEEE Trans. VLSI Syst. 6, 1, 74--83. Google Scholar
Digital Library
- Micheli, G. D. 1994. Synthesis and Optimization of Digital Circuits. McGraw-Hill Higher Education. Google Scholar
Digital Library
- Najjar, W. A. 2007. Compiling code accelerators for FPGAS. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'07). Google Scholar
Digital Library
- Park, N. and Parker, A. 1988. SEHWA: A program for synthesis of pipelines. In Proceedings of the 25 Years of DAC: Papers on 25 Years of Electronic Design Automation. 595--601. Google Scholar
Digital Library
- Paulin, P. G. and Knight, J. P. 1987. Force-directed scheduling in automatic data path synthesis. In Proceedings of the 24th ACM/IEEE conference on Design automation (DAC'87). 195--202. Google Scholar
Digital Library
- Rau, B. R. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchtecture. 63--74. Google Scholar
Digital Library
- Rotem, E., Mendelson, A., Ginosar, R., and Weiser, U. 2009. Multiple clock and voltage domains for chip multi processors. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO42). ACM, New York, NY, 459--468. Google Scholar
Digital Library
- Sivaraman, M. and Aditya, S. 2002. Cycle-time aware architecture synthesis of custom hardware accelerators. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'02). 35--42. Google Scholar
Digital Library
- Walker, R. and Chaudhuri, S. 1995. High-level synthesis: Introduction to the scheduling problem. IEEE Design Test Comput. 12, 2, 60--69. Google Scholar
Digital Library
- Weinhardt, M. 1997. Compilation and pipeline synthesis for reconfigurable architectures - high performance by configware. In Proceedings of the Reconfigurable Architecture Workshop.Google Scholar
- Weinhardt, M. and Luk, W. 2001. Pipeline vectorization. IEEE Trans. Comput. Aid. Desi. Integr. Circuits Syst. 234--248. Google Scholar
Digital Library
- Wolfe, M. 1991. The tiny loop restructuring research tool. In Proceedings of the International Conference on Parallel Processing.Google Scholar
- Yosi, B.-A. and Nadav, R. 2009. Binary synthesis with multiple memory banks targeting array references. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'09). IEEE Computer Society, 600--603.Google Scholar
Index Terms
The benefits of using variable-length pipelined operations in high-level synthesis
Recommendations
Layout-driven RTL binding techniques for high-level synthesis
ISSS '96: Proceedings of the 9th international symposium on System synthesisThe importance of effective and efficient accounting of layout effects is well-established in high-level synthesis (HLS), since it allows more realistic exploration of the design space and the generation of solutions with predictable metrics. This ...
High-Level Synthesis with Variable-Latency Components
VLSID '00: Proceedings of the 13th International Conference on VLSI DesignThis paper presents techniques to integrate the use of variable-latency units in a high-level synthesis design methodology. Components used as building blocks (e.g., functional units) in conventional high-level synthesis techniques are assumed to have ...
An Introduction to High-Level Synthesis
Editor's note:High-level synthesis raises the design abstraction level and allows rapid generation of optimized RTL hardware for performance, area, and power requirements. This article gives an overview of state-of-the-art HLS techniques and tools.—Tim ...






Comments