Abstract
We present a technique for implementing dataflow networks as compositional hardware circuits. We first define an abstract dataflow model with unbounded buffers that supports data-dependent blocks (mux, demux, and nondeterministic merge); we then show how to faithfully implement such networks with bounded buffers and handshaking.
Handshaking admits compositionality: our circuits can be connected with or without buffers, and combinational cycles arise only from a completely unbuffered cycle. While bounding buffer sizes can cause the system to deadlock prematurely, the system is guaranteed to produce the same, correct, data before then. Thus, unless the system deadlocks, inserting or removing buffers only affects its performance. We demonstrate how this enables design space to be explored.
- ARM. 2010. AMBA 4 AXI4-Stream Protocol Specification Version 1.0.Google Scholar
- Twan Basten and Jan Hoogerbrugge. 2001. Efficient execution of process networks. In Communicating Process Architectures (CPA), Alan Chalmers, Majid Mirmehdi, and Henk Muller (Eds.). IOS Press, Bristol, UK, 1--14.Google Scholar
- Endri Bezati, Marco Mattavelli, and Jörn W. Janneck. 2013. High-level synthesis of dataflow programs for signal processing systems. In Proceedings of the International Symposium on Image and Signal Processing and Analysis (ISPA’13). IEEE, 750--755.Google Scholar
- Manfred Broy. 1988. Nondeterministic data flow programs: How to avoid the merge anomaly. Sci. Comput. Program. 10, 1 (Feb. 1988), 65--85. Google Scholar
Digital Library
- Joseph Tobin Buck. 1993. Scheduling Dynamic Dataflow Graphs with Bounded Memory Using the Token Flow Model. Ph.D. Dissertation. University of California, Berkeley.Google Scholar
- Bingyi Cao, Kenneth A. Ross, Martha A. Kim, and Stephen A. Edwards. 2015. Implementing latency-insensitive dataflow blocks. In Proceedings of the International Conference on Formal Methods and Models for Codesign (MEMOCODE’15). IEEE, 179--187. Google Scholar
Digital Library
- Luca P. Carloni. 2006. The role of back-pressure in implementing latency-insensitive systems. Electr. Not. Theor. Comput. Sci. 146, 2 (2006), 61--80. Google Scholar
Digital Library
- Luca P. Carloni, Kenneth L. McMillan, and Alberto L. Sangiovanni-Vincentelli. 2001. Theory of latency-insensitive design. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 20, 9 (Sep. 2001), 1059--1076. Google Scholar
Digital Library
- Josep Carmona, Jordi Cortadella, Mike Kishinevsky, and Alexander Taubin. 2009. Elastic circuits. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 28, 10 (Oct. 2009), 1437--1455. Google Scholar
Digital Library
- Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. 2001. Introduction to Algorithms (McGraw-Hill, New York, NY). Google Scholar
Digital Library
- Jordi Cortadella, Marc Galceran-Oms, and Mike Kishinevsky. 2010. Elastic systems. In Proceedings of the International Conference on Formal Methods and Models for Codesign (MEMOCODE’10). IEEE, 149--158.Google Scholar
Digital Library
- Jordi Cortadella, Mike Kishinevsky, and Bill Grundmann. 2006. SELF: Specification and design of synchronous elastic circuits. In Proceedings of the ACM International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems. ACM, New York, NY, 6.Google Scholar
- Jack B. Dennis. 1974. First version of a data flow procedure language. In Programming Symposium, Lecture Notes in Computer Science, Vol. 19. Springer, Berlin, 362--376. Google Scholar
Digital Library
- Giorgos Dimitrakopoulos, Anastasios Psarras, and Ioannis Seitanidis. 2015. Microarchitecture of Network-on-Chip Routers: A Designer’s Perspective. Springer, Berlin. Google Scholar
Digital Library
- Johan Eker and Jörn W. Janneck. 2003. CAL Language Report: Specification of the CAL Actor Language. Technical Report UCB/ERL M03/48. EECS Department, University of California, Berkeley.Google Scholar
- Joachim Falk, Christian Haubelt, and Jürgen Teich. 2006. Efficient representation and simulation of model-based designs in SystemC. In Proceedings of the Forum on Specification and Design Languages (FDL’06), Vol. 6. ECSI, Darmstadt, 129--134.Google Scholar
- G. R. Gao, R. Govindarajan, and Prakash Panangaden. 1992. Well-behaved dataflow programs for DSP computation. In Proceedings of the International Conference on Acoustics, Speech, 8 Signal Processing (ICASSP’92), Vol. 5. IEEE, 561--564.Google Scholar
Cross Ref
- Marc Geilen and Twan Basten. 2003. Requirements on the execution of kahn process networks. In Proceedings of the European Symposium on Programming (ESOP’03), Lecture Notes in Computer Science, Vol. 2618. Springer, Berlin, 319--334. Google Scholar
Digital Library
- Marc Geilen, Twan Basten, and Sander Stuijk. 2005. Minimising buffer requirements of synchronous dataflow graphs with model checking. In Proceedings of the 42nd Design Automation Conference. ACM, New York, NY, 819--824. Google Scholar
Digital Library
- Marc Geilen and Sander Stuijk. 2010. Worst-case performance analysis of synchronous dataflow scenarios. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’10). ACM, New York, NY, 125--134. Google Scholar
Digital Library
- Pieter H. Hartel, Theo C. Ruys, and Marc C. W. Geilen. 2008. Scheduling optimisations for SPIN to minimise buffer requirements in synchronous data flow. In Proceedings of the Conference on Formal Methods in Computer-Aided Design (FMCAD’08). IEEE, Los Alamitos, CA, 161--170. Google Scholar
Digital Library
- Christian Haubelt, Joachim Falk, Joachim Keinert, Thomas Schlichter, Martin Streubühr, Andreas Deyhle, Andreas Hadert, and Jürgen Teich. 2007. A SystemC-based design methodology for digital signal processing systems. EURASIP J. Embed. Syst. 2007, 1 (2007), 22. Google Scholar
Digital Library
- Haruo Hosoya and Benjamin Pierce. 2001. Regular expression pattern matching for XML. ACM SIGPLAN Not. 36, 3 (2001), 67--80. Google Scholar
Digital Library
- Haruo Hosoya and Benjamin C. Pierce. 2003. XDuce: A statically typed XML processing language. ACM Trans. Internet Technol. 3, 2 (May 2003), 117--148. Google Scholar
Digital Library
- Intel Corporation. 1972. 8008 8-Bit Parallel Central Processor Unit Users Manual. Intel Corporation, Santa Clara, CA.Google Scholar
- Jörn W. Janneck, Ian D. Miller, David B. Parlour, Ghislain Roquier, and Matthieu Wipliez Mickaël Raulet. 2009. Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study. J. Sign. Process. Syst. 63, 2 (Jul. 2009), 241--249. Google Scholar
Digital Library
- Gilles Kahn. 1974. The semantics of a simple language for parallel programming. In Information Processing 74: Proceedings of IFIP Congress 74. North-Holland, Stockholm, Sweden, 471--475.Google Scholar
- Joachim Keinert, Martin Streubühr, Thomas Schlichter, Joachim Falk, Jens Gladigau, Christian Haubelt, Jürgen Teich, and Michael Meredith. 2009. SystemCoDesigner—An automatic ESL synthesis approach by design space exploration and behavioral synthesis for streaming applications. ACM Trans. Des. Autom. Electr. Syst. 14, 1, Article 1 (Jan. 2009), 23 pages. Google Scholar
Digital Library
- Edward A. Lee and Eleftherios Matsikoudis. 2008. The semantics of dataflow with firing. In From Semantics to Computer Science: Essays in Memory of Gilles Kahn. Cambridge University Press, Cambridge, UK, Chapter 4, 71--94.Google Scholar
- Edward A. Lee and Thomas M. Parks. 1995. Dataflow process networks. Proc. IEEE 83, 5 (May 1995), 773--801.Google Scholar
Cross Ref
- Cheng-Hong Li, Rebecca Collins, Sampada Sonalkar, and Luca P. Carloni. 2007. Design, implementation, and validation of a new class of interface circuits for latency-insensitive design. In Proceedings of the International Conference on Formal Methods and Models for Codesign (MEMOCODE’07). IEEE, 13--22. Google Scholar
Digital Library
- Zhonghai Lu, Ingo Sander, and Axel Jantsch. 2002. A case study of hardware and software synthesis in ForSyDe. In Proceedings of the International Symposium on System Synthesis (ISSS’02). ACM, 86--91. Google Scholar
Digital Library
- Orlando Moreira, Twan Basten, Marc Geilen, and Sander Stuijk. 2010. Buffer sizing for rate-optimal single-rate dataflow scheduling revisited. IEEE Trans. Comput. 59, 2 (2010), 188--201. Google Scholar
Digital Library
- Thomas M. Parks. 1995. Bounded Scheduling of Process Networks. Ph.D. Dissertation. University of California, Berkeley. Google Scholar
Digital Library
- Keshav Pingali and Arvind. 1985. Efficient demand-driven evaluation. Part 1. ACM Trans. Program. Lang. Syst. 7, 2 (1985), 311--333. Google Scholar
Digital Library
- Rafael T. Possignolo, Elanz Ebrahimi, Haven Skinner, and Jose Renau. 2016. Fluid pipelines: Elastic circuitry meets out-of-order execution. In Proceedings of the IEEE International Conference on Computer Design (ICCD’16). IEEE, 233--240.Google Scholar
Cross Ref
- Ingo Sander. 2003. System Modeling and Design Refinement in ForSyDe. Ph.D. Dissertation. Royal Institute of Technology, Stockholm, Sweden.Google Scholar
- Ingo Sander and Axel Jantsch. 1999. System synthesis based on a formal computational model and skeletons. In Proceedings of the IEEE Computer Society Workshop on VLSI. IEEE, 32--39. Google Scholar
Digital Library
- Charles L. Seitz. 1980. System timing. In Introduction to VLSI Systems, Carver Mead and Lynn Conway (Eds.). Addison-Wesley, Reading, MA, Chapter 7, 218--262.Google Scholar
- Richard W. Sharp and Alan Mycroft. 2000. The FLaSH Compiler: Efficient Circuits from Functional Specifications. Technical Report tr.2000.3. AT8T Laboratories Cambridge.Google Scholar
- Sander Stuijk, Marc C. W. Geilen, and Twan Basten. 2008. Throughput-buffering trade-off exploration for cyclo-static and synchronous dataflow graphs. IEEE Trans. Comput. 57, 10 (2008), 1331--1345. Google Scholar
Digital Library
- Richard Thavot, Romuald Mosqueron, Julien Dubois, and Marco Mattavelli. 2009. Hardware synthesis of complex standard interfaces using CAL dataflow descriptions. In Proceedings of Design and Architectures for Signal and Image Processing (DASIP’09). ECSI, Sophia Antipolis, France, 127--134.Google Scholar
- Richard Townsend, Martha A. Kim, and Stephen A. Edwards. 2017. From functional programs to pipelined dataflow circuits. In Proceedings of Compiler Construction (CC’17). ACM, New York, NY, 76--86. Google Scholar
Digital Library
- Stavros Tripakis, Rhishikesh Limaye, Kaushik Ravindran, and Guoqiang Wang. 2014. On tokens and signals: Bridging the semantic gap between dataflow models and hardware implementations. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV). IEEE, 51--58.Google Scholar
Cross Ref
- Lisa Wu, Orestis Polychroniou, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. 2014. Energy analysis of hardware and software range partitioning. ACM Trans. Comput. Syst. 32, 3 (Aug. 2014), 8. 24 pages. Google Scholar
Digital Library
- Christian Zebelein, Christian Haubelt, Joachim Falk, Tobias Schwarzer, and Jürgen Teich. 2014. Model-based actor multiplexing with application to complex communication protocols. In Proceedings of Design, Automation, and Test in Europe (DATE). IEEE, 216--219. Google Scholar
Digital Library
Index Terms
Compositional Dataflow Circuits
Recommendations
Speculative Dataflow Circuits
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysWith FPGAs facing broader application domains, the conversion of imperative languages into dataflow circuits has been recently revamped as a way to overcome the conservatism of statically scheduled high-level synthesis. Apart from the ability to extract ...
Buffer Placement and Sizing for High-Performance Dataflow Circuits
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), ...
Buffer Placement and Sizing for High-Performance Dataflow Circuits
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysCommercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches)...






Comments