Abstract
The advent of chip-level parallel architectures prompted a renewal of interest into dataflow process networks. The trend is to model an application independently from the architecture, then the model is morphed to best fit the target architecture. One downplayed aspect is the mapping of communications through the on-chip topology. The cost of such communications is often prevalent with regard to computations.
This article establishes a dataflow process network called K-periodically Routed Graph (KRG), which serves the role of representing the various routing decisions during the transformation of a genuine application into a architecture-aware version for this application.
- Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. 2013. FastFlow: High-level and efficient streaming on multi-core. In Programming Multi-Core and Many-Core Computing Systems, S. Pllana and F. Xhafa (Eds.). Wiley.Google Scholar
- Randy Allen and Ken Kennedy. 1984. Automatic loop interchange (with retrospective). In Best of PLDI, Kathryn S. McKinley (Ed.). ACM, New York, NY, 75--90.Google Scholar
- Eitan Altman, Bruno Gaujal, and Arie Hordijk. 2000. Balanced sequences and optimal routing. Journal of the ACM 47, 4, 752--775. DOI:http://dx.doi.org/10.1145/347476.347482 Google Scholar
Digital Library
- Luca Benini, Eric Flamand, Didier Fuin, and Diego Melpignano. 2012. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Proceedings of the Design, Automation, and Test in Europe Conference Exhibition (DATE). 983--987. DOI:http://dx.doi.org/10.1109/DATE.2012.6176639 Google Scholar
Digital Library
- Shuvra S. Bhattacharyya, Edward A. Lee, and Praveen K. Murthy. 1996. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, Norwell, MA. Google Scholar
Digital Library
- Greet Bilsen, Marc Engels, Rudy Lauwereins, and Jean A. Peperstraete. 1995. Cyclo-static dataflow. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP-95), Vol. 5. 3255--3258. DOI:http://dx.doi.org/10.1109/ICASSP.1995.479579Google Scholar
- Julien Boucaron, Jean-Vivien Millo, and Robert De Simone. 2006. Latency-insensitive design and central repetitive scheduling. In Proceedings of the 4th ACM and IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE’06). IEEE, Los Alamitos, CA, 175--183.Google Scholar
Digital Library
- Florian Brandner and Martin Schoeberl. 2012. Static routing in symmetric real-time network-on-chips. In Proceedings of the 20th International Conference on Real-Time and Network Systems (RTNS’12). ACM, New York, NY, 61--70. DOI:http://dx.doi.org/10.1145/2392987.2392995 Google Scholar
Digital Library
- David Broman, Michael Zimmer, Yooseong Kim, Hokeun Kim, Jian Cai, Aviral Shrivastava, Stephen A. Edwards, and Edward A. Lee. 2013. Precision timed infrastructure: Design challenges. In Proceedings of the Electronic System Level Synthesis Conference (ESLsyn’13). 1--6. http://chess.eecs.berkeley.edu/pubs/993.html.Google Scholar
- Joseph T. Buck. 1993. Scheduling Dynamic Dataflow Graphs with Bounded Memory Using the Token Flow Model. Ph.D. Dissertation. University of California, Berkeley. Google Scholar
Digital Library
- José Cano, José Flich, José Duato, Marcello Coppola, and Riccardo Locatelli. 2011. Efficient routing implementation in complex systems-on-chip designs. In Proceedings of the 5th ACM/IEEE International Symposium on Networks-on-Chip (NOCS’11). ACM, New York, NY, 1--8. DOI:http://dx.doi.org/10.1145/1999946.1999948 Google Scholar
Digital Library
- Rohit Chandra, Leonardo Dagun, Dave Kohr, Dror Maydan, Jeff McDonald, and Ramesh Menon. 2001. Parallel programming in OpenMP. Morgan Kaufmann, San Francisco, CA. http://opac.inria.fr/record=b1101261. Google Scholar
Digital Library
- Piotr Chrzastowski-Wachtel and Marek Raczunas. 1993. Liveness of weighted circuits and the Diophantine problem of Frobenius. In Fundamentals of Computation Theory. Springer, 171--180. Google Scholar
Digital Library
- Anthony Coadou. 2010. Réseaux de processus flots de données avec routage pour la modélisation de systèmes embarqués. Ph.D. Dissertation. University of Nice Sophia Antipolis.Google Scholar
- Albert Cohen, Marc Duranton, Christine Eisenbeis, Claire Pagetti, Florence Plateau, and Marc Pouzet. 2006. N-synchronous Kahn networks: A relaxed model of synchrony for real-time systems. In Proceedings of POPL’06: Conference Record of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM Press, New York, NY, 180--193. DOI:http://dx.doi.org/10.1145/1111037.1111054 Google Scholar
Digital Library
- Frederic Commoner, Anatol W. Holt, Shimon Even, and Amir Pnueli. 1971. Marked directed graph. Journal of Computer and System Sciences 5, 511--523. Google Scholar
Digital Library
- Michel Cosnard and Denis Trystram. 1993. Algorithmes et architectures parallèles. InterEditions, Paris. http://opac.inria.fr/record=b1077080.Google Scholar
- Loïc Cudennec and Renaud Sirdey. 2012. Parallelism reduction based on pattern substitution in dataflow oriented programming languages. Procedia Computer Science 9, 146--155. DOI:http://dx.doi.org/10.1016/j.procs.2012.04.016Google Scholar
Cross Ref
- Giovanni de Micheli and Luca Benini. 2006. Networks on Chips. Morgan Kauffmann (Elsevier).Google Scholar
- Jean de Rumeur. 1994. Communication dans les réseaux de processeurs. Masson, Paris, France.Google Scholar
- Manel Djemal, Francois Pecheux, Dumitru Potop-Butucaru, Robert de Simone, Franck Wajsburt, and Zhen Zhang. 2012. Programmable routers for efficient mapping of applications onto NoC-based MPSoCs. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP’12). 1--8.Google Scholar
- Paul Feautrier. 1992a. Some efficient solutions to the affine scheduling problem. I. One-dimensional time. International Journal of Parallel Programming 21, 5, 313--347. Google Scholar
Digital Library
- Paul Feautrier. 1992b. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. International Journal of Parallel Programming 21, 6, 389--420. Google Scholar
Digital Library
- Pascal Fradet, Alain Girault, and Peter Poplavkoy. 2012. SPDF: A schedulable parametric data-flow MoC. In DATE, W. Rosenstiel and L. Thiele (Eds.). IEEE, Los Alamitos, CA, 769--774. Google Scholar
Digital Library
- Kees Goossens and Andreas Hansson. 2010. The aethereal network on chip after ten years: Goals, evolution, lessons, and future. In Proceedings of the 47th ACM/IEEE Design Automation Conference (DAC’10). 306--311. Google Scholar
Digital Library
- Michael I. Gordon. 2010. Compiler Techniques for Scalable Performance of Stream Programs on Multicore Architectures. Ph.D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA. Google Scholar
Digital Library
- Juraj Hromkovič, Ralf Klasing, Andrzej Pelc, Peter Ružička, and Walter Unger. 2005. Dissemination of Information in Communication Networks: Part I. Broadcasting, Gossiping, Leader Election, and Fault-Tolerance. Springer-Verlag. Google Scholar
Digital Library
- Gilles Kahn. 1974. The semantics of a simple language for parallel programming. In Information Processing 74: Proceedings of the IFIP Congress 74. 471--475.Google Scholar
- Kalray. 2012. MPPA Manycore. Retrieved March 18, 2015, from http://www.kalray.eu/products/mppa-manycore.Google Scholar
- Michal Karczmarek, William Thies, and Saman Amarasinghe. 2003. Phased scheduling of stream programs. In Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems (LCTES’03). ACM, New York, NY, 103--112. DOI:http://dx.doi.org/10.1145/780732.780747 Google Scholar
Digital Library
- Richard M. Karp, Raymond E. Miller, and Shmuel Winograd. 1967. The organization of computations for uniform recurrence equations. Journal of the ACM 14, 3, 563--590. DOI:http://dx.doi.org/10.1145/321406.321418 Google Scholar
Digital Library
- Bart Kienhuis, Edwin Rijpkema, and Ed Deprettere. 2000. Compaan: Deriving process networks from Matlab for embedded signal processing architectures. In Proceedings of the 8th International Workshop on Hardware/Software Codesign (CODES’00). 13--17. Google Scholar
Digital Library
- Hermann Kopetz and Günther Bauer. 2003. The time-triggered architecture. Proceedings of the IEEE 91, 1, 112--126.Google Scholar
Cross Ref
- Leslie Lamport. 1974. The parallel execution of DO loops. Communications of the ACM 17, 2, 83--93. Google Scholar
Digital Library
- Edward A Lee. 2006. The problem with threads. Computer 39, 5, 33--42. Google Scholar
Digital Library
- Edward A. Lee and David G. Messerschmitt. 1987a. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers C-36, 1, 24--35. Google Scholar
Digital Library
- Edward A. Lee and David G. Messerschmitt. 1987b. Synchronous data flow. Proceeding of the IEEE 75, 9, 1235--1245.Google Scholar
Cross Ref
- F. Thomson Leighton. 1992. Introduction to Parallel Algorithms and Architectures: Array, Trees, Hypercubes. Morgan Kaufmann, San Francisco, CA. Google Scholar
Digital Library
- Diego Melpignano, Luca Benini, Eric Flamand, Bruno Jego, Thierry Lepley, Germain Haugou, Fabien Clermidy, and Denis Dutoit. 2012. Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). 1137--1142. Google Scholar
Digital Library
- Jean-Vivien Millo and Robert Simone. 2013. Explicit routing schemes for implementation of cellular automata on processor arrays. Natural Computing 12, 3, 353--368. DOI:http://dx.doi.org/10.1007/s11047-013-9378-5 Google Scholar
Digital Library
- Robin Milner. 1982. A Calculus of Communicating Systems. Springer-Verlag, New York, NY. Google Scholar
Digital Library
- Hristo Nikolov, Todor Stefanov, and Ed Deprettere. 2008. Systematic and automated multiprocessor system design, programming, and implementation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27, 3, 542--555. DOI:http://dx.doi.org/10.1109/TCAD.2007.911337 Google Scholar
Digital Library
- Thomas Parks. 1995. Bounded Scheduling of Process Networks. Ph.D. Dissertation. Department of EECS, University of California, Berkeley. Google Scholar
Digital Library
- Carl A. Petri. 1962. Kommunikation mit Automaten. Ph.D. Dissertation. Technische Universitat Darmstadt, Germany.Google Scholar
- Ville Rantala, Teijo Lehtonen, and Juha Plosila. 2006. Network on Chip Routing Algorithms. Turku Centre for Computer Science.Google Scholar
- Kaushik Ravindran, Arkadeb Ghosal, Rhishikesh Limaye, Guoqiang Wang, Guang Yang, and Hugo Andrade. 2012. Analysis techniques for static dataflow models with access patterns. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP’12). 1--8.Google Scholar
- Faizal A. Samman, Thomas Hollstein, and Mandfred Glesner. 2008. Multicast parallel pipeline router architecture for network-on-chip. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’08). 1396--1401. DOI:http://dx.doi.org/10.1109/DATE.2008.4484869 Google Scholar
Digital Library
- Bart D. Theelen, Marc C. W. Geilen, Twan Basten, Jeroen P. M. Voeten, Stefan V. Gheorghita, and Sander Stuijk. 2006. A scenario-aware data flow model for combined long-run average and worst-case performance analysis. In Proceedings of the 4th IEEE/ACM International Conference on Formal Methods and Models for Co-Design (MEMOCODE’06). 185--194. DOI:http://dx.doi.org/10.1109/MEMCOD.2006.1695924Google Scholar
Digital Library
- Sven Verdoolaege. 2013. Polyhedral process networks. In Handbook of Signal Processing Systems, S. S. Bhattacharyya, E. F. Deprettere, R. Leupers, and J. Takala (Eds.). Springer, New York, NY, 1335--1375. DOI:http://dx.doi.org/10.1007/978-1-4614-6859-2_41Google Scholar
- Miao Wang and François Bodin. 2011. Compiler-directed memory management for heterogeneous MPSoCs. Journal of Systems Architecture 57, 1, 134--145. DOI:http://dx.doi.org/10.1016/j.sysarc.2010.10.008 Google Scholar
Digital Library
- David Whelihan. 2013. NoCsim. Retrieved March 18, 2015, from http://nocsim.sourceforge.net/.Google Scholar
- Maarten H. Wiggers, Marco J. G. Bekooij, and Gerard J. M. Smit. 2008. Buffer capacity computation for throughput constrained streaming applications with data-dependent inter-task communication. In Proceedings of the Real-Time and Embedded Technology and Applications Symposium (RTAS’08). IEEE, Los Alamitos, CA, 183--194. DOI:http://dx.doi.org/10.1109/RTAS.2008.10 Google Scholar
Digital Library
Index Terms
Modeling and Analyzing Dataflow Applications on NoC-Based Many-Core Architectures
Recommendations
Non-minimal, turn-model based NoC routing
In this study, it is shown that any deadlock-free, turn-model based minimal routing algorithm can be extended to a non-minimal routing algorithm. Specifically, three novel non-minimal NoC routing algorithms are proposed based on the Odd-Even, West-First,...
3D NOC for many-core processors
With an increasing number of processors forming many-core chip multiprocessors (CMP), there exists a need for easily scalable, high-performance and low-power intra-chip communication infrastructure for emerging systems. In CMPs with hundreds of ...
A routing-table-based adaptive and minimal routing scheme on network-on-chip architectures
In this paper, we present a routing algorithm that combines the shortest path routing and adaptive routing schemes for NoCs. In specific, routing follows the shortest path to ensure low latency and low energy consumption. This routing scheme requires ...






Comments