skip to main content
research-article

Modeling and Analyzing Dataflow Applications on NoC-Based Many-Core Architectures

Published:21 April 2015Publication History
Skip Abstract Section

Abstract

The advent of chip-level parallel architectures prompted a renewal of interest into dataflow process networks. The trend is to model an application independently from the architecture, then the model is morphed to best fit the target architecture. One downplayed aspect is the mapping of communications through the on-chip topology. The cost of such communications is often prevalent with regard to computations.

This article establishes a dataflow process network called K-periodically Routed Graph (KRG), which serves the role of representing the various routing decisions during the transformation of a genuine application into a architecture-aware version for this application.

References

  1. Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. 2013. FastFlow: High-level and efficient streaming on multi-core. In Programming Multi-Core and Many-Core Computing Systems, S. Pllana and F. Xhafa (Eds.). Wiley.Google ScholarGoogle Scholar
  2. Randy Allen and Ken Kennedy. 1984. Automatic loop interchange (with retrospective). In Best of PLDI, Kathryn S. McKinley (Ed.). ACM, New York, NY, 75--90.Google ScholarGoogle Scholar
  3. Eitan Altman, Bruno Gaujal, and Arie Hordijk. 2000. Balanced sequences and optimal routing. Journal of the ACM 47, 4, 752--775. DOI:http://dx.doi.org/10.1145/347476.347482 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Luca Benini, Eric Flamand, Didier Fuin, and Diego Melpignano. 2012. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Proceedings of the Design, Automation, and Test in Europe Conference Exhibition (DATE). 983--987. DOI:http://dx.doi.org/10.1109/DATE.2012.6176639 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shuvra S. Bhattacharyya, Edward A. Lee, and Praveen K. Murthy. 1996. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, Norwell, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Greet Bilsen, Marc Engels, Rudy Lauwereins, and Jean A. Peperstraete. 1995. Cyclo-static dataflow. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP-95), Vol. 5. 3255--3258. DOI:http://dx.doi.org/10.1109/ICASSP.1995.479579Google ScholarGoogle Scholar
  7. Julien Boucaron, Jean-Vivien Millo, and Robert De Simone. 2006. Latency-insensitive design and central repetitive scheduling. In Proceedings of the 4th ACM and IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE’06). IEEE, Los Alamitos, CA, 175--183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Florian Brandner and Martin Schoeberl. 2012. Static routing in symmetric real-time network-on-chips. In Proceedings of the 20th International Conference on Real-Time and Network Systems (RTNS’12). ACM, New York, NY, 61--70. DOI:http://dx.doi.org/10.1145/2392987.2392995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. David Broman, Michael Zimmer, Yooseong Kim, Hokeun Kim, Jian Cai, Aviral Shrivastava, Stephen A. Edwards, and Edward A. Lee. 2013. Precision timed infrastructure: Design challenges. In Proceedings of the Electronic System Level Synthesis Conference (ESLsyn’13). 1--6. http://chess.eecs.berkeley.edu/pubs/993.html.Google ScholarGoogle Scholar
  10. Joseph T. Buck. 1993. Scheduling Dynamic Dataflow Graphs with Bounded Memory Using the Token Flow Model. Ph.D. Dissertation. University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. José Cano, José Flich, José Duato, Marcello Coppola, and Riccardo Locatelli. 2011. Efficient routing implementation in complex systems-on-chip designs. In Proceedings of the 5th ACM/IEEE International Symposium on Networks-on-Chip (NOCS’11). ACM, New York, NY, 1--8. DOI:http://dx.doi.org/10.1145/1999946.1999948 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Rohit Chandra, Leonardo Dagun, Dave Kohr, Dror Maydan, Jeff McDonald, and Ramesh Menon. 2001. Parallel programming in OpenMP. Morgan Kaufmann, San Francisco, CA. http://opac.inria.fr/record=b1101261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Piotr Chrzastowski-Wachtel and Marek Raczunas. 1993. Liveness of weighted circuits and the Diophantine problem of Frobenius. In Fundamentals of Computation Theory. Springer, 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Anthony Coadou. 2010. Réseaux de processus flots de données avec routage pour la modélisation de systèmes embarqués. Ph.D. Dissertation. University of Nice Sophia Antipolis.Google ScholarGoogle Scholar
  15. Albert Cohen, Marc Duranton, Christine Eisenbeis, Claire Pagetti, Florence Plateau, and Marc Pouzet. 2006. N-synchronous Kahn networks: A relaxed model of synchrony for real-time systems. In Proceedings of POPL’06: Conference Record of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM Press, New York, NY, 180--193. DOI:http://dx.doi.org/10.1145/1111037.1111054 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Frederic Commoner, Anatol W. Holt, Shimon Even, and Amir Pnueli. 1971. Marked directed graph. Journal of Computer and System Sciences 5, 511--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michel Cosnard and Denis Trystram. 1993. Algorithmes et architectures parallèles. InterEditions, Paris. http://opac.inria.fr/record=b1077080.Google ScholarGoogle Scholar
  18. Loïc Cudennec and Renaud Sirdey. 2012. Parallelism reduction based on pattern substitution in dataflow oriented programming languages. Procedia Computer Science 9, 146--155. DOI:http://dx.doi.org/10.1016/j.procs.2012.04.016Google ScholarGoogle ScholarCross RefCross Ref
  19. Giovanni de Micheli and Luca Benini. 2006. Networks on Chips. Morgan Kauffmann (Elsevier).Google ScholarGoogle Scholar
  20. Jean de Rumeur. 1994. Communication dans les réseaux de processeurs. Masson, Paris, France.Google ScholarGoogle Scholar
  21. Manel Djemal, Francois Pecheux, Dumitru Potop-Butucaru, Robert de Simone, Franck Wajsburt, and Zhen Zhang. 2012. Programmable routers for efficient mapping of applications onto NoC-based MPSoCs. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP’12). 1--8.Google ScholarGoogle Scholar
  22. Paul Feautrier. 1992a. Some efficient solutions to the affine scheduling problem. I. One-dimensional time. International Journal of Parallel Programming 21, 5, 313--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Paul Feautrier. 1992b. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. International Journal of Parallel Programming 21, 6, 389--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Pascal Fradet, Alain Girault, and Peter Poplavkoy. 2012. SPDF: A schedulable parametric data-flow MoC. In DATE, W. Rosenstiel and L. Thiele (Eds.). IEEE, Los Alamitos, CA, 769--774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kees Goossens and Andreas Hansson. 2010. The aethereal network on chip after ten years: Goals, evolution, lessons, and future. In Proceedings of the 47th ACM/IEEE Design Automation Conference (DAC’10). 306--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael I. Gordon. 2010. Compiler Techniques for Scalable Performance of Stream Programs on Multicore Architectures. Ph.D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Juraj Hromkovič, Ralf Klasing, Andrzej Pelc, Peter Ružička, and Walter Unger. 2005. Dissemination of Information in Communication Networks: Part I. Broadcasting, Gossiping, Leader Election, and Fault-Tolerance. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Gilles Kahn. 1974. The semantics of a simple language for parallel programming. In Information Processing 74: Proceedings of the IFIP Congress 74. 471--475.Google ScholarGoogle Scholar
  29. Kalray. 2012. MPPA Manycore. Retrieved March 18, 2015, from http://www.kalray.eu/products/mppa-manycore.Google ScholarGoogle Scholar
  30. Michal Karczmarek, William Thies, and Saman Amarasinghe. 2003. Phased scheduling of stream programs. In Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems (LCTES’03). ACM, New York, NY, 103--112. DOI:http://dx.doi.org/10.1145/780732.780747 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Richard M. Karp, Raymond E. Miller, and Shmuel Winograd. 1967. The organization of computations for uniform recurrence equations. Journal of the ACM 14, 3, 563--590. DOI:http://dx.doi.org/10.1145/321406.321418 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Bart Kienhuis, Edwin Rijpkema, and Ed Deprettere. 2000. Compaan: Deriving process networks from Matlab for embedded signal processing architectures. In Proceedings of the 8th International Workshop on Hardware/Software Codesign (CODES’00). 13--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hermann Kopetz and Günther Bauer. 2003. The time-triggered architecture. Proceedings of the IEEE 91, 1, 112--126.Google ScholarGoogle ScholarCross RefCross Ref
  34. Leslie Lamport. 1974. The parallel execution of DO loops. Communications of the ACM 17, 2, 83--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Edward A Lee. 2006. The problem with threads. Computer 39, 5, 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Edward A. Lee and David G. Messerschmitt. 1987a. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers C-36, 1, 24--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Edward A. Lee and David G. Messerschmitt. 1987b. Synchronous data flow. Proceeding of the IEEE 75, 9, 1235--1245.Google ScholarGoogle ScholarCross RefCross Ref
  38. F. Thomson Leighton. 1992. Introduction to Parallel Algorithms and Architectures: Array, Trees, Hypercubes. Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Diego Melpignano, Luca Benini, Eric Flamand, Bruno Jego, Thierry Lepley, Germain Haugou, Fabien Clermidy, and Denis Dutoit. 2012. Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). 1137--1142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jean-Vivien Millo and Robert Simone. 2013. Explicit routing schemes for implementation of cellular automata on processor arrays. Natural Computing 12, 3, 353--368. DOI:http://dx.doi.org/10.1007/s11047-013-9378-5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Robin Milner. 1982. A Calculus of Communicating Systems. Springer-Verlag, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hristo Nikolov, Todor Stefanov, and Ed Deprettere. 2008. Systematic and automated multiprocessor system design, programming, and implementation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27, 3, 542--555. DOI:http://dx.doi.org/10.1109/TCAD.2007.911337 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Thomas Parks. 1995. Bounded Scheduling of Process Networks. Ph.D. Dissertation. Department of EECS, University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Carl A. Petri. 1962. Kommunikation mit Automaten. Ph.D. Dissertation. Technische Universitat Darmstadt, Germany.Google ScholarGoogle Scholar
  45. Ville Rantala, Teijo Lehtonen, and Juha Plosila. 2006. Network on Chip Routing Algorithms. Turku Centre for Computer Science.Google ScholarGoogle Scholar
  46. Kaushik Ravindran, Arkadeb Ghosal, Rhishikesh Limaye, Guoqiang Wang, Guang Yang, and Hugo Andrade. 2012. Analysis techniques for static dataflow models with access patterns. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP’12). 1--8.Google ScholarGoogle Scholar
  47. Faizal A. Samman, Thomas Hollstein, and Mandfred Glesner. 2008. Multicast parallel pipeline router architecture for network-on-chip. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’08). 1396--1401. DOI:http://dx.doi.org/10.1109/DATE.2008.4484869 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Bart D. Theelen, Marc C. W. Geilen, Twan Basten, Jeroen P. M. Voeten, Stefan V. Gheorghita, and Sander Stuijk. 2006. A scenario-aware data flow model for combined long-run average and worst-case performance analysis. In Proceedings of the 4th IEEE/ACM International Conference on Formal Methods and Models for Co-Design (MEMOCODE’06). 185--194. DOI:http://dx.doi.org/10.1109/MEMCOD.2006.1695924Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Sven Verdoolaege. 2013. Polyhedral process networks. In Handbook of Signal Processing Systems, S. S. Bhattacharyya, E. F. Deprettere, R. Leupers, and J. Takala (Eds.). Springer, New York, NY, 1335--1375. DOI:http://dx.doi.org/10.1007/978-1-4614-6859-2_41Google ScholarGoogle Scholar
  50. Miao Wang and François Bodin. 2011. Compiler-directed memory management for heterogeneous MPSoCs. Journal of Systems Architecture 57, 1, 134--145. DOI:http://dx.doi.org/10.1016/j.sysarc.2010.10.008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. David Whelihan. 2013. NoCsim. Retrieved March 18, 2015, from http://nocsim.sourceforge.net/.Google ScholarGoogle Scholar
  52. Maarten H. Wiggers, Marco J. G. Bekooij, and Gerard J. M. Smit. 2008. Buffer capacity computation for throughput constrained streaming applications with data-dependent inter-task communication. In Proceedings of the Real-Time and Embedded Technology and Applications Symposium (RTAS’08). IEEE, Los Alamitos, CA, 183--194. DOI:http://dx.doi.org/10.1109/RTAS.2008.10 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Modeling and Analyzing Dataflow Applications on NoC-Based Many-Core Architectures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!