Abstract
Synchronous dataflow graphs are widely used to model digital signal processing and multimedia applications. Self-timed execution is an efficient methodology for the analysis and scheduling of synchronous dataflow graphs. In this article, we propose a communication-aware self-timed execution approach to solve the problem of scheduling synchronous dataflow graphs on multicore systems with communication delays. Based on this communication-aware self-timed execution approach, four communication-aware scheduling algorithms are proposed using different allocation rules. Furthermore, a code-size-aware mapping heuristic is proposed and jointly used with a proposed scheduling algorithm to reduce the code size of SDFGs on multicore systems. The proposed scheduling algorithms are experimentally evaluated and found to perform better than existing algorithms in terms of throughput and runtime for several applications. The experiments also show that the proposed code-size-aware mapping approach can achieve significant code size reduction with limited throughput degradation in most cases.
- Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, and Suraj Kumar. 2016. Are static schedules so bad? A case study on Cholesky factorization. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’16).Google Scholar
Cross Ref
- Árpád Beszédes, Rudolf Ferenc, Tibor Gyimóthy, André Dolenc, and Konsta Karsisto. 2003. Survey of code-size reduction methods. Comput. Surv. 35, 3 (2003), 223--267.Google Scholar
Digital Library
- Shuvra S. Bhattacharyya, Praveen K. Murthy, and Edward A. Lee. 1999. Synthesis of embedded software from synchronous dataflow specifications. J. VLSI Sig. Proc. Syst. Sig. Image Vid. Technol. 21, 2 (1999), 151--166.Google Scholar
Digital Library
- Bruno Bodin, Alix Munier-Kordon, and Benoît Dupont de Dinechin. 2012. K-periodic schedules for evaluating the maximum throughput of a synchronous dataflow graph. In Proceedings of the International Conference on Embedded Computer Systems (SAMOS’12).Google Scholar
Cross Ref
- Alessio Bonfietti, Michele Lombardi, Michela Milano, and Luca Benini. 2013. Maximum-throughput mapping of SDFGs on multi-core SoC platforms. J. Parallel Distrib. Comput. 73, 10 (2013), 1337--1350.Google Scholar
Digital Library
- Louis-Claude Canon, Emmanuel Jeannot, Rizos Sakellariou, and Wei Zheng. 2008. Comparative evaluation of the robustness of DAG scheduling heuristics. In Grid Computing: Achievements and Prospects. Springer.Google Scholar
- Sardar M. Farhad, Yousun Ko, Bernd Burgstaller, and Bernhard Scholz. 2011. Orchestration by approximation: Mapping stream programs onto multicore architectures. ACM SIGPLAN Not. 46, 3 (2011), 357--368.Google Scholar
Digital Library
- Amir Hossein Ghamarian, M. C. W. Geilen, Sander Stuijk, Twan Basten, Bart D. Theelen, Mohammad Reza Mousavi, A. J. M. Moonen, and M. J. G. Bekooij. 2006. Throughput analysis of synchronous data flow graphs. In Proceedings of the International Conference on Application of Concurrency to System Design (ACSD’06).Google Scholar
Digital Library
- Michael I. Gordon, William Thies, and Saman Amarasinghe. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’06).Google Scholar
Digital Library
- Zonghua Gu, Mingxuan Yuan, Nan Guan, Mingsong Lv, Xiuqiang He, Qingxu Deng, and Ge Yu. 2007. Static scheduling and software synthesis for dataflow graphs with symbolic model-checking. In Proceedings of the IEEE Real-time Systems Symposium (RTSS’07).Google Scholar
Digital Library
- Ming-Yung Ko, Claudiu Zissulescu, Sebastian Puthenpurayil, Shuvra S. Bhattacharyya, Bart Kienhuis, and Ed F. Deprettere. 2007. Parameterized looped schedules for compact representation of execution sequences in DSP hardware and software implementation. IEEE Trans. Sig. Proc. 55, 6 (2007), 3126--3138.Google Scholar
Digital Library
- Chung-Yee Lee, Jing-Jang Hwang, Yuan-Chieh Chow, and Frank D. Anger. 1988. Multiprocessor scheduling with interprocessor communication delays. Op. Res. Lett. 7, 3 (1988), 141--147.Google Scholar
Digital Library
- Edward Ashford Lee and David G. Messerschmitt. 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 100, 1 (1987), 24--35.Google Scholar
Digital Library
- Charles E. Leiserson and James B. Saxe. 1991. Retiming synchronous circuitry. Algorithmica 6, 1 (1991), 5--35.Google Scholar
Digital Library
- Stan Liao, Srinivas Devadas, Kurt Keutzer, Steve Tjiang, and Albert Wang. 1995. Storage assignment to decrease code size. ACM SIGPLAN Not. 30 (1995), 186--195.Google Scholar
Digital Library
- Jing Lin, Andreas Gerstlauer, and Brian L. Evans. 2012. Communication-aware heterogeneous multiprocessor mapping for real-time streaming systems. J. Sig. Proc. Syst. 69, 3 (2012), 279--291.Google Scholar
Digital Library
- Weichen Liu, Zonghua Gu, and Jiang Xu. 2009. Efficient software synthesis for dynamic single appearance scheduling of synchronous dataflow. IEEE Embed. Syst. Lett. 1, 3 (2009), 69--72.Google Scholar
Digital Library
- Mingze Ma and Rizos Sakellariou. 2016. Buffer minimization for rate-optimal scheduling of synchronous dataflow graphs on multicore systems. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’16).Google Scholar
Cross Ref
- Mingze Ma and Rizos Sakellariou. 2017. Work-in-progress: Code-size-aware mapping for synchronous dataflow graphs on multicore systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion (CASES’17).Google Scholar
Digital Library
- Mingze Ma and Rizos Sakellariou. 2018. Communication-aware scheduling algorithms for synchronous dataflow graphs on multicore systems. In Proceedings of the International Conference on Embedded Computer Systems (SAMOS’18).Google Scholar
Digital Library
- Mingze Ma and Rizos Sakellariou. 2018. Reducing code size in scheduling synchronous dataflow graphs on multicore systems. In Proceedings of the Workshops on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM’18).Google Scholar
- Avinash Malik and David Gregg. 2013. Orchestrating stream graphs using model checking. ACM Trans. Archit. Code Optim. 10, 3 (2013), 19.Google Scholar
Digital Library
- Praveen K. Murthy, Shuvra S. Bhattacharyya, and Edward A. Lee. 1997. Joint minimization of code and data for synchronous dataflow programs. Formal Meth. Syst. Des. 11, 1 (1997), 41--70.Google Scholar
Digital Library
- Hyunok Oh, Nikil Dutt, and Soonhoi Ha. 2006. Memory optimal single appearance schedule with dynamic loop count for synchronous dataflow graphs. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’06).Google Scholar
Digital Library
- SDF. 2018. Retrieved from http://www.es.ele.tue.nl/sdf3.Google Scholar
- Gilbert C. Sih and Edward A. Lee. 1993. A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst. 4, 2 (1993), 175--187.Google Scholar
Digital Library
- StreamIt. 2018. Retrieved from http://groups.csail.mit.edu/cag/streamit.Google Scholar
- Sander Stuijk, Twan Basten, M. C. W. Geilen, and Henk Corporaal. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In Proceedings of the Design Automation Conference (DAC’07).Google Scholar
- Sander Stuijk, Marc Geilen, and Twan Basten. 2006. Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In Proceedings of the Design Automation Conference (DAC’06).Google Scholar
Digital Library
- S. Stuijk, M. C. W. Geilen, and T. Basten. 2006. SDF: SDF for free. In Proceedings of the International Conference on Application of Concurrency to System Design (ACSD’06).Google Scholar
- Wonyong Sung and Soonhoi Ha. 2000. Memory efficient software synthesis with mixed coding style from dataflow graphs. IEEE Trans. Very Large Scale Integ. Syst. 8, 5 (2000), 522--526.Google Scholar
Digital Library
- Qi Tang, Twan Basten, Marc Geilen, Sander Stuijk, and Ji-Bo Wei. 2017. Mapping of synchronous dataflow graphs on MPSoCs based on parallelism enhancement. J. Parallel Distrib. Comput. 101 (2017), 79--91.Google Scholar
Digital Library
- William Thies, Michal Karczmarek, and Saman Amarasinghe. 2002. StreamIt: A language for streaming applications. In Proceedings of the International Conference on Compiler Construction.Google Scholar
Cross Ref
- Haluk Topcuoglu, Salim Hariri, and Min-you Wu. 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13, 3 (2002), 260--274.Google Scholar
Digital Library
- Yi Wang, Duo Liu, Meng Wang, Zhiwei Qin, and Zili Shao. 2010. Optimal task scheduling by removing inter-core communication overhead for streaming applications on MPSoC. In Proceedings of the IEEE Real-time and Embedded Technology and Applications Symposium (RTAS’10).Google Scholar
Digital Library
- Xue-Yang Zhu, Marc Geilen, Twan Basten, and Sander Stuijk. 2012. Static rate-optimal scheduling of multirate DSP algorithms via retiming and unfolding. In Proceedings of the IEEE Real-time and Embedded Technology and Applications Symposium (RTAS’12).Google Scholar
Digital Library
- Xue-Yang Zhu, Marc Geilen, Twan Basten, and Sander Stuijk. 2014. Memory-constrained static rate-optimal scheduling of synchronous dataflow graphs via retiming. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’14).Google Scholar
- Xue-Yang Zhu, Marc Geilen, Twan Basten, and Sander Stuijk. 2016. Multiconstraint static scheduling of synchronous dataflow graphs via retiming and unfolding. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 35, 6 (2016), 905--918.Google Scholar
Digital Library
- Ahmad Zmily and Christos Kozyrakis. 2006. Simultaneously improving code size, performance, and energy in embedded processors. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’06).Google Scholar
Cross Ref
Index Terms
Code-size-aware Scheduling of Synchronous Dataflow Graphs on Multicore Systems
Recommendations
Communication-aware scheduling algorithms for synchronous dataflow graphs on multicore systems
SAMOS '18: Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and SimulationSynchronous dataflow graphs are widely used to model digital signal processing and multimedia applications. Self-timed execution is an efficient methodology for the analysis and scheduling of synchronous dataflow graphs. In this paper, we propose a ...
Schedule-Extended Synchronous Dataflow Graphs
Synchronous dataflow graphs (SDFGs) are used extensively to model streaming applications. An SDFG can be extended with scheduling decisions, allowing SDFG analysis to obtain properties, such as throughput or buffer sizes for the scheduled graphs. ...
Performance of Work Conserving Schedulers and Scheduling of Some Synchronous Dataflow Graphs
ICPADS '04: Proceedings of the Parallel and Distributed Systems, Tenth International ConferenceWe know a lot about competitive or approximation ratiosof scheduling algorithms. This, though, cannot betranslated into direct bounds on the schedule produced bya scheduling algorithm, because often the optimal solutionis intractable. We derive a ...






Comments