Abstract
Stream processing applications running on Heterogeneous Multi-Processor Systems on Chips (HMPSoCs) require efficient resource allocation and management, both at compile-time and at runtime. To cope with modern adaptive applications whose behavior can not be exhaustively predicted at compile-time, runtime managers must be able to take resource allocation decisions on-the-fly, with a minimum overhead on application performance.
Resource allocation algorithms often rely on an internal modeling of an application. Directed Acyclic Graph (DAGs) are the most commonly used models for capturing control and data dependencies between tasks. DAGs are notably often used as an intermediate representation for deploying applications modeled with a dataflow Model of Computation (MoC) on HMPSoCs. Building such intermediate representation at runtime for massively parallel applications is costly both in terms of computation and memory overhead.
In this paper, an intermediate representation of DAGs for resource allocation is presented. This new representation shows improved performance for run-time analysis of dataflow graphs with less overhead in both computation time and memory footprint. The performances of the proposed representation are evaluated on a set of computer vision and machine learning applications.
- Matin Abadi et al. 2016. TensorFlow: A system for large-scale machine learning. 265--283.Google Scholar
- Florian Arrestier, Karol Desnos, Maxime Pelcat, Julien Heulot, Eduardo Juarez, and Daniel Menard. 2018. Delays and states in dataflow models of computation. In Proceedings of the 18th International Conference on Embedded Computer Systems Architectures, Modeling, and Simulation - SAMOS’18. ACM Press, Pythagorion, Greece, 47--54. DOI:https://doi.org/10.1145/3229631.3229645Google Scholar
Digital Library
- Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2009. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. (2009), 16.Google Scholar
- Bishnupriya Bhattacharya and Shuvra S. Bhattacharyya. 2001. Parameterized dataflow modeling for DSP systems. IEEE Transactions on Signal Processing 49, 10 (2001), 2408--2421.Google Scholar
Digital Library
- Shuvra S. Bhattacharyya, Edward A. Lee, and Praveen K. Murphy. 1996. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, Norwell, MA, USA.Google Scholar
- G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete. 1996. Cycle-static dataflow. IEEE Transactions on Signal Processing 44, 2 (Feb. 1996), 397--408. DOI:https://doi.org/10.1109/78.485935Google Scholar
Digital Library
- Morteza Damavandpeyma, Sander Stuijk, Twan Basten, Marc Geilen, and Henk Corporaal. 2013. Schedule-extended synchronous dataflow graphs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 10 (Oct. 2013), 1495--1508. DOI:https://doi.org/10.1109/TCAD.2013.2265852Google Scholar
Digital Library
- Hamza Deroui, Karol Desnos, Jean-François Nezan, and Alix Munier-Kordon. 2017. Relaxed subgraph execution model for the throughput evaluation of IBSDF graphs. In International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).Google Scholar
Cross Ref
- Karol Desnos, Maxime Pelcat, Jean-François Nezan, Shuvra S. Bhattacharyya, and Slaheddine Aridhi. 2013. Pimm: Parameterized and interfaced dataflow meta-model for mpsocs runtime reconfiguration. In Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII), 2013 International Conference on. IEEE, 41--48.Google Scholar
- Pascal Fradet, Alain Girault, and Peter Poplavko. 2012. SPDF: A schedulable parametric data-flow MoC. In Proceedings of the Conference on Design, Automation and Test in Europe. EDA Consortium, 769--774.Google Scholar
Digital Library
- Thierry Gautier, Joao VF Lima, Nicolas Maillard, and Bruno Raffin. 2013. Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. In Parallel 8 Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on. IEEE, 1299--1308.Google Scholar
- Kronos Group. 2013. The OpenVX API for hardware acceleration. In http://www.khronos.org/openvx.Google Scholar
- Julien Heulot, Maxime Pelcat, Karol Desnos, Jean-François Nezan, and Slaheddine Aridhi. 2014. Spider: A synchronous parameterized and interfaced dataflow-based rtos for multicore dsps. In Education and Research Conference (EDERC), 2014 6th European Embedded Design in. IEEE, 167--171.Google Scholar
Cross Ref
- J. Keinert, C. Haubelt, and J. Teich. 2006. Modeling and analysis of windowed synchronous algorithms. In 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, Vol. 3. IEEE, Toulouse, France, III--892--III--895. DOI:https://doi.org/10.1109/ICASSP.2006.1660798Google Scholar
Cross Ref
- Y.-K. Kwok. 1997. High-performance algorithms of compile-time scheduling of parallel processors. Ph.D. Dissertation. Hong Kong University of Science and Technology. Advisor(s) Ahmad, Ishfaq.Google Scholar
- Edward A. Lee and David G. Messerschmitt. 1987. Synchronous data flow. Proc. IEEE 75, 9 (1987), 1235--1245.Google Scholar
Cross Ref
- Edward A. Lee and Thomas M. Parks. 1995. Dataflow process networks. Proc. IEEE 83, 5 (1995), 773--801.Google Scholar
- Maxime Pelcat, Karol Desnos, Julien Heulot, Clément Guy, Jean-François Nezan, and Slaheddine Aridhi. 2014. Preesm: A dataflow-based rapid prototyping framework for simplifying multicore dsp programming. In Education and Research Conference (EDERC), 2014 6th European Embedded Design in. IEEE, 36--40.Google Scholar
Cross Ref
- Jonathan Piat, Shuvra S. Bhattacharyya, and Mickaël Raulet. 2009. Interface-based hierarchy for synchronous data-flow graphs. In Signal Processing Systems, 2009. SiPS 2009. IEEE Workshop on. IEEE, 145--150.Google Scholar
Cross Ref
- José Luis Pino, Shuvra S. Bhattacharyya, and Edward A. Lee. 1995. A Hierarchical Multiprocessor Scheduling Framework for Synchronous Dataflow Graphs. Electronics Research Laboratory, College of Engineering, University of California.Google Scholar
- Sebastian Ritz, Matthias Pankert, V. Zivojinovic, and Heinrich Meyr. 1993. Optimum vectorization of scalable synchronous dataflow graphs. In Application-Specific Array Processors, 1993. Proceedings., International Conference on. IEEE, 285--296.Google Scholar
Cross Ref
- Jiahao Wu, Timothy Blattner, Walid Keyrouz, and Shuvra S. Bhattacharyya. 2018. A design tool for high performance image processing on multicore platforms. In 2018 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE). IEEE, Dresden, Germany, 1304--1309. DOI:https://doi.org/10.23919/DATE.2018.8342215Google Scholar
- George F. Zaki, William Plishker, Shuvra S. Bhattacharyya, and Frank Fruth. 2012. Partial expansion graphs: Exposing parallelism and dynamic scheduling opportunities for DSP applications. In 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors. IEEE, Delft, Netherlands, 86--93. DOI:https://doi.org/10.1109/ASAP.2012.14Google Scholar
Digital Library
- George F. Zaki, William Plishker, Shuvra S. Bhattacharyya, and Frank Fruth. 2017. Implementation, scheduling, and adaptation of partial expansion graphs on multicore platforms. Journal of Signal Processing Systems 87, 1 (April 2017), 107--125. DOI:https://doi.org/10.1007/s11265-016-1107-8Google Scholar
Digital Library
Index Terms
Numerical Representation of Directed Acyclic Graphs for Efficient Dataflow Embedded Resource Allocation
Recommendations
Convex Resource Allocation Problems on Directed Acyclic Graphs: Duality, Complexity, Special Cases, and Extensions
Consider the following resource allocation problem on a directed acyclic graph the precedence graph. Each vertex has a known work load, and a fixed amount of total resource is available. The time required to process a vertex is inversely proportional to ...
Resource reconstruction algorithms for on-demand allocation in virtual computing resource pool
Resource reconstruction algorithms are studied in this paper to solve the problem of resource on-demand allocation and improve the efficiency of resource utilization in virtual computing resource pool. Based on the idea of resource virtualization and ...
Efficient Resource Allocation Mechanism for Federated Clouds
This study proposes a novel efficient resource allocation mechanism for federated clouds, which takes the communication overhead into consideration, to improve system throughput and reduce resource repacking overhead in the auto-scaling mechanism. In ...






Comments