skip to main content
research-article

The benefits of using variable-length pipelined operations in high-level synthesis

Published:24 December 2013Publication History
Skip Abstract Section

Abstract

Current high-level synthesis systems synthesize arithmetic units of a fixed known number of stages, and the scheduler mainly determines when units are activated. We focus on scheduling techniques for the high-level synthesis of pipelined arithmetic units where the number of stages of these operations is a free parameter of the synthesis. This problem is motivated by the ability to automatically create pipelined functional units, such as multipliers, with different pipe lengths. These units have different characteristics in terms of parallelism level, clock latency, frequency, etc. This article presents the Variable-length Pipeline Scheduler (VPS). The ability to synthesize variable-length pipelined units expands the known scheduling problem of high-level synthesis to include a search for a minimal number of hardware units (operations) and their desired number of stages. The proposed search procedure is based on algorithms that find a local minima in a d-dimensional grid, thus avoiding the need to evaluate all possible points in the space. We have implemented a C language compiler for VPS targeting FPGAs. Our results demonstrate that using variable-length pipeline units can reduce the overall resource usage and improve the execution time when synthesized onto an FPGA. The proposed search is sufficiently fast, taking only a few seconds, allowing an interactive mode of work. A comparison with xPilot shows a significant saving of hardware resources while maintaining comparable execution times of the resulting circuits. This work is an extension of a previous paper [Ben-Asher and Rotem 2008]

References

  1. Aldous, D. 1983. Minimization algorithms and random walk on the d-cube. Ann. Probab. 11, 2, 403--413.Google ScholarGoogle ScholarCross RefCross Ref
  2. Asher, Y. B. and Schohat, E. 2008. Finding the best compromise in compiling compound loops to verilog. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bacon, D. F., Graham, S. L., and Sharp, O. J. 1994. Compiler transformations for high-performance computing. ACM Comput. Surv. 26, 4, 345--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ben-Asher, Y. and Meisler, D. 2006. Towards a source level compiler: Source level modulo scheduling. In Proceedings of the 5th Workshop on Compile and Runtime Techniques for Parallel Computing (CRTPC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ben-Asher, Y. and Rotem, N. 2008. Synthesis for variable pipelined function units. In Proceedings of the International Symposium on System-on-Chip (SOC'08). IEEE Computer Society, 1--4.Google ScholarGoogle Scholar
  6. Ben-Asher, Y. and Rotem, N. 2010. Automatic memory partitioning: Increasing memory parallelism via data structure partitioning. In Proceedings of the International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Camposano, R. 1991. Path-based scheduling for synthesis. IEEE Trans. Comput. Aid. Des. 10, 1, 85--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chavet, C., Andriamisaina, C., Coussy, P., Casseau, E., Juin, E., Urard, P., and Martin, E. 2007. A design flow dedicated to multi-mode architectures for dsp applications. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chen, D., Cong, J., Fan, Y., Han, G., Jiang, W., and Zhang, Z. 2005. XPilot: A platform-based behavioral synthesis system. In Proceedings of the SRC TECHCON.Google ScholarGoogle Scholar
  10. Cong, J. and Jiang, W. 2008. Pattern-based behavior synthesis for fpga resource reduction. In Proceedings of the 16th International ACM/SIGDA Symposium on Field Programmable Gate Arrays (FPGA'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Devadas, S., Ghosh, A., and Keutzer, K. 1994. Logic Synthesis. McGraw-Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dong, Y., Zhou, J., Dou, Y., Deng, L., and Zhao, J. 2008. Impact of loop unrolling on area, throughput and clock frequency for window operations based on a data schedule method. In Proceedings of the Congress on Image and Signal Processing (CISP'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gajski, D., Dutt, N., We, A., and Lin, S. 1994. High-Level Synthesis Introduction to Chip and System Design. Kluwer Academic Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ghosh, A., Lodha, S. K., and Vemuri, R. 1999. Hierarchical scheduling in high level synthesis using resource sharing across nested loops. In Proceedings of the 9th Great Lakes Symposium on VLSI. 140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hannig, F., Dutta, H., and Teich, J. 2009. Parallelization approaches for hardware accelerators—loop unrolling versus loop partitioning. In Proceedings of the 22nd International Conference on Architecture of Computing Systems (ARCS'09). Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hannig, F., Ruckdeschel, H., Dutta, H., and Teich, J. 2008. Paro: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications. In Reconfigurable Computing: Architectures, Tools and Applications, Lecture Notes in Computer Sciences, vol. 4943, Springer-Verlag, Berlin, 287--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hwang, C. T., Lee, J.-H., and Hsu, Y. C. 1991. A formal approach to the scheduling problem in high level synthesis. IEEE Trans. Comput. Aid. Desi. Integr. Circuits Syst. 10, 4, 464--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ishikawa, M. and De Micheli, G. 1991. A module selection algorithm for high-level synthesis. In Proceedings of the IEEE International Sympoisum on Circuits and Systems. IEEE, 1777--1780.Google ScholarGoogle Scholar
  19. Ito, K., Lucke, L., and Parhi, K. 1998. Ilp-based cost-optimal dsp synthesis with module selection and data format conversion. IEEE Trans. VLSI Syst. 6, 4, 582--594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kleinberg J., and Tardos, E. 2006. Algorithm Design. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kudlur, M., Fan, K., and Mahlke, S. 2006. Streamroller: Automatic synthesis of prescribed throughput accelerator pipelines. In Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'06). 270--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kurra, S., Singh, N. K., and Panda, P. R. 2007. The impact of loop unrolling on controller delay in high level synthesis. In Proceedings of the Conference on Design, Automation and test in Europe (DATE'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lam, M. 1988. Software pipelining: An effective scheduling technique for vliw machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 318--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lattner, C. and Adve, V. 2002. The LLVM instruction set and compilation strategy. Tech. rep. UIUCDCS-R-2002-2292, University of Illinois at Urbana-Champaign.Google ScholarGoogle Scholar
  25. Llewellyn, D. C., Tovey, C., and Trick, M. 1989. Local optimization on graphs. Discrete Appl. Math. 23, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Llosa, J. 1996. Swing modulo scheduling: A lifetime-sensitive approach. In Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT'96). IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Maheshwari, N. and Sapatnekar, S. 1998. Efficient retiming of large circuits. IEEE Trans. VLSI Syst. 6, 1, 74--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Micheli, G. D. 1994. Synthesis and Optimization of Digital Circuits. McGraw-Hill Higher Education. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Najjar, W. A. 2007. Compiling code accelerators for FPGAS. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Park, N. and Parker, A. 1988. SEHWA: A program for synthesis of pipelines. In Proceedings of the 25 Years of DAC: Papers on 25 Years of Electronic Design Automation. 595--601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Paulin, P. G. and Knight, J. P. 1987. Force-directed scheduling in automatic data path synthesis. In Proceedings of the 24th ACM/IEEE conference on Design automation (DAC'87). 195--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rau, B. R. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchtecture. 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rotem, E., Mendelson, A., Ginosar, R., and Weiser, U. 2009. Multiple clock and voltage domains for chip multi processors. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO42). ACM, New York, NY, 459--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sivaraman, M. and Aditya, S. 2002. Cycle-time aware architecture synthesis of custom hardware accelerators. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'02). 35--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Walker, R. and Chaudhuri, S. 1995. High-level synthesis: Introduction to the scheduling problem. IEEE Design Test Comput. 12, 2, 60--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Weinhardt, M. 1997. Compilation and pipeline synthesis for reconfigurable architectures - high performance by configware. In Proceedings of the Reconfigurable Architecture Workshop.Google ScholarGoogle Scholar
  37. Weinhardt, M. and Luk, W. 2001. Pipeline vectorization. IEEE Trans. Comput. Aid. Desi. Integr. Circuits Syst. 234--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wolfe, M. 1991. The tiny loop restructuring research tool. In Proceedings of the International Conference on Parallel Processing.Google ScholarGoogle Scholar
  39. Yosi, B.-A. and Nadav, R. 2009. Binary synthesis with multiple memory banks targeting array references. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'09). IEEE Computer Society, 600--603.Google ScholarGoogle Scholar

Index Terms

  1. The benefits of using variable-length pipelined operations in high-level synthesis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!