Abstract
This article introduces a new technique to minimize the memory footprints of Digital Signal Processing (DSP) applications specified with Synchronous Dataflow (SDF) graphs and implemented on shared-memory Multiprocessor System-on-Chip (MPSoCs). In addition to the SDF specification, which captures data dependencies between coarse-grained tasks called actors, the proposed technique relies on two optional inputs abstracting the internal data dependencies of actors: annotations of the ports of actors, and script-based specifications of merging opportunities between input and output buffers of actors. Experimental results on a set of applications show a reduction of the memory footprint by 48% compared to state-of-the-art minimization techniques.
- Oliver Jakob Arndt, Daniel Becker, Christian Banz, and Holger Blume. 2013. Parallel implementation of real-time semi-global matching on embedded multi-core architectures. In Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII). IEEE, 56--63.Google Scholar
- Mohamed Benazouz, Olivier Marchetti, Alix Munier-Kordon, and Pascal Urard. 2010. A new approach for minimizing buffer capacities with throughput constraint for embedded system design. In 2010 IEEE/ACS Computer Systems and Applications (AICCSA). IEEE, 1--8. Google Scholar
Digital Library
- Joseph Buck, Soonhoi Ha, Edward A. Lee, and David G. Messerschmitt. 1994. Ptolemy: A framework for simulating and prototyping heterogeneous systems. International Journal of Computer Simulation 4 (1994), 155--182.Google Scholar
- C. Sidney Burrus and Peter W. Eschenbacher. 1981. An in-place, in-order prime factor FFT algorithm. IEEE Transactions on Acoustics, Speech and Signal Processing 29, 4 (Aug. 1981), 806--817.Google Scholar
- Loïc Cudennec, Paul Dubrulle, François Galea, Thierry Goubier, and Renaud Sirdey. 2014. Generating code and memory buffers to reorganize data on many-core architectures. Procedia Computer Science 29 (2014), 1123--1133.Google Scholar
Cross Ref
- Eddy De Greef, Francky Catthoor, and Hugo De Man. 1997. Array placement for storage size reduction in embedded multimedia systems. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors. IEEE, 66--75. Google Scholar
Digital Library
- Karol Desnos, Maxime Pelcat, Jean-François. Nezan, and Slaheddine Aridhi. 2014. Memory analysis and optimized allocation of dataflow applications on shared-memory MPSoCs. Journal of Signal Processing Systems, Springer US 80 (July 2014), 19--37. Google Scholar
Digital Library
- Karol Desnos, Maxime Pelcat, Jean-François. Nezan, and Slaheddine Aridhi. 2015. Buffer merging technique for minimizing memory footprints of synchronous dataflow specifications. In International Conference on Acoustics, Speech and Signal Processing (ICASSP’15). IEEE, 1111--1115.Google Scholar
Cross Ref
- S. El Assad and H. Noura. 2013. Generator of chaotic sequences and corresponding generating system. (Feb. 2013). Patent No. US 8781116 B2, Filed March 28, 2011, Issued Jul. 15, 2014.Google Scholar
- Janet Fabri. 1979. Automatic Storage Optimization. Courant Institute of Mathematical Sciences, NY University.Google Scholar
- Scott Fischaber, Roger Woods, and John McAllister. 2007. SoC memory hierarchy derivation from dataflow graphs. In Workshop on Signal Processing Systems. IEEE, 469--474.Google Scholar
Cross Ref
- J. Forget, C. Gensoul, M. Guesdon, C. Lavarenne, C. Macabiau, Y. Sorel, and C. Stentzel. 2013. SynDEx v7 User Manual. INRIA Paris-Rocquencourt. Retrieved from http://www.syndex.org/v7/manual/manual.pdf.Google Scholar
- Viliam Geffert and Jozef Gajdoš. 2011. In-place sorting. In SOFSEM 2011: Theory and Practice of Computer Science, I. Čern, T. Gyimthy, J. Hromkovič, K. Jefferey, R. Krlović, M. Vukolić, and S. Wolf (Eds.). Lecture Notes in Computer Science, Vol. 6543. Springer, Berlin, 248--259. Google Scholar
Digital Library
- Marc Geilen, Twan Basten, and Sander Stuijk. 2005. Minimising buffer requirements of synchronous dataflow graphs with model checking. In Design Automation Conference. ACM, NY, 819--824. Google Scholar
Digital Library
- Chia-Jui Hsu, Ming-Yung Ko, and Shuvra S. Bhattacharyya. 2005. Software synthesis from the dataflow interchange format. In Proceedings of the 2005 Workshop on Software and Compilers for Embedded Systems (SCOPES’05). ACM, 37--49. Google Scholar
Digital Library
- Kalray. 2013. Many-core processors—Dataflow. Retrieved from http://www.kalray.eu/technology/dataflow/.Google Scholar
- Jong T. Kim and Dong R. Shin. 2002. New efficient clique partitioning algorithms for register-transfer synthesis of data paths. Journal of the Korean Physical Society 40 (2002), 754--758.Google Scholar
Cross Ref
- Edward A. Lee and David G. Messerschmitt. 1987. Synchronous data flow. Proceedings of the IEEE 75, 9 (Sept. 1987), 1235--1245.Google Scholar
Cross Ref
- Edward A. Lee and Thomas M. Parks. 1995. Dataflow process networks. Proceedings of the IEEE 83, 5 (1995), 773--801.Google Scholar
Cross Ref
- David P. Magee. 2005. Matlab extensions for the development, testing and verification of real-time DSP software. In Proceedings of the 42nd Annual Design Automation Conference. ACM, NY, 603--606. Google Scholar
Digital Library
- Amith R. Mamidala, Daniel Faraj, Sameer Kumar, Douglas Miller, Michael Blocksome, Thomas Gooding, Philiph Heidelberger, and Gabor Dozsa. 2011. Optimizing MPI collectives using efficient intra-node communication techniques over the Blue Gene/P supercomputer. In International Symposium onParallel and Distributed Processing Workshops and PhD Forum (IPDPSW). IEEE, 771--780. Google Scholar
Digital Library
- Alexandre Mercat, Jean-François Nezan, Daniel Menard, and Jinglin Zhang. 2014. Implementation of a stereo matching algorithm onto a manycore embedded system. In International Symposium on Circuits and Systems. IEEE, 1296--1299.Google Scholar
Cross Ref
- Praveen K. Murthy and Shuvra S. Bhattacharyya. 2004. Buffer merging: A powerful technique for reducing memory requirements of synchronous dataflow specifications. Transactions on Design Automation of Electronic Systems 9, 2 (April 2004), 212--237. Google Scholar
Digital Library
- Patrick Niemeyer. 2014. BeanShell website. Retrieved from http://www.beanshell.org.Google Scholar
- Maxime Pelcat, Slaheddine Aridhi, Jonathan Piat, and Jean-François Nezan. 2012. Physical Layer Multi-Core Prototyping: A Dataflow-Based Approach for LTE eNodeB. Springer. Google Scholar
Digital Library
- Maxime Pelcat, Karol Desnos, Julien Heulot, Clement Guy, Jean-François. Nezan, and Slaheddine Aridhi. 2014. PREESM: A dataflow-based rapid prototyping framework for simplifying multicore DSP programming. In EDERC 2014 Proceedings. IEEE, 36.Google Scholar
Cross Ref
- Jonathan Piat, Shuvra S. Bhattacharyya, and Mickaël Raulet. 2009. Interface-based hierarchy for synchronous data-flow graphs. In SiPS Proceedings. IEEE, 145--150.Google Scholar
Cross Ref
- Russel W. Quong and Shu-Ching Chen. 1993. Register Allocation via Weighted Graph Coloring. ECE Technical Reports. 232.Google Scholar
- Sander Stuijk, Marc Geilen, and Twan Basten. 2006a. Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In Proceedings of Design Automation Conference. ACM, 899--904. Google Scholar
Digital Library
- Sander Stuijk, Marc Geilen, and Twan Basten. 2006b. SDF3: SDF for free. In Proceeding of the Conference on Application of Concurrency to System Design (ACSD’06). IEEE Computer Society, 276--278. Google Scholar
Digital Library
- Daisuke Takahashi. 2000. High-performance parallel FFT algorithms for the HITACHI SR8000. In Proceedings of High Performance Computing in the Asia-Pacific Region, Vol. 1. IEEE, 192--199.Google Scholar
Cross Ref
- Matthieu Wipliez and Mickaël Raulet. 2010. Classification and transformation of dynamic dataflow programs. In Design and Architectures for Signal and Image Processing (DASIP), IEEE, 303--310.Google Scholar
- Yervant Zorian. 2002. Embedded memory test and repair: Infrastructure IP for SOC yield. In Proceedings of the International Test Conference. IEEE, 340--349. Google Scholar
Digital Library
Index Terms
On Memory Reuse Between Inputs and Outputs of Dataflow Actors
Recommendations
Buffer merging—a powerful technique for reducing memory requirements of synchronous dataflow specifications
We develop a new technique called buffer merging for reducing memory requirements of synchronous dataflow (SDF) specifications. SDF has proven to be an attractive model for specifying DSP systems, and is used in many commercial tools like System Canvas, ...
Sessional dataflow: short paper
DAMP '12: Proceedings of the 7th workshop on Declarative aspects and applications of multicore programmingThe purpose of sessional dataflow is to provide a compositional semantics for dataflow computations that can be scheduled at compile-time. The interesting issues arise in enforcing static flow requirements in the composition of actors, ensuring that ...
Memory-Centric Hardware Synthesis from Dataflow Models
SAMOS '08: Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and SimulationGeneration of hardware architectures directly from dataflow representations is increasingly being considered as research moves toward system level design methodologies. Creation of networks of IP cores to implement actor functionality is a common ...






Comments