Abstract
Multiprocessor systems are becoming ubiquitous in today’s embedded systems design. In this article, we address the problem of mapping an application represented by a Homogeneous Synchronous Dataflow (HSDF) graph onto a real-time multiprocessor platform with the objective of maximizing total throughput. We propose that the optimal solution to the problem is composed of three components: actor-to-processor mapping, retiming, and actor ordering on each processor. The entire problem is systematically modeled into a Boolean Satisfiability (SAT) problem such that the optimal solution can be guaranteed theoretically. In order to explore the vast solution space more efficiently, we develop a specific HSDF theory solver based on the special characteristics of the timed HSDF, and integrate it into the general search framework of the SAT solver. Two alternative integration methods based on branch-and-bound are presented to achieve early branch pruning in the search space; thus, the scalability is greatly improved. Extensive performance evaluation on synthetic examples and a case study on the realistic H.264 Video Decoder show that our approach provides as much as 76.9% throughput improvement, and is scalable to industry-sized applications.
- Mauricio Alvarez, Arnaldo Azevedo, Alex Ramrez, Cor Meenderinck, Mateo Valero, and Ben Juurlink. 2009. Performance evaluation of macroblock-level parallelization of H.264 decoding on a CC-NUMA multiprocessor architecture. In Proceedings of the 4th Colombian Computing Conference (4CCC’09).Google Scholar
- A. Bonfietti, L. Benini, M. Lombardi, and M. Milano. 2010. An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms. In Design, Automation Test in Europe Conference Exhibition (DATE’10). 897--902. Google Scholar
Digital Library
- Jason Cong, Guoling Han, and Wei Jiang. 2007. Synthesis of an application-specific soft multiprocessor system. In Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA’07), André DeHon and Mike Hutton (Eds.). ACM/SIGDA, New York, NY, Monterey, CA, 99--107. Google Scholar
Digital Library
- Ali Dasdan and Rajesh K. Gupta. 1998. Faster maximum and minimum mean cycle algorithms for system-performance analysis. IEEE Transactions on CAD of Integrated Circuits and Systems 17, 10, 889--899. Google Scholar
Digital Library
- Tracy C. Denk and Keshab K. Parhi. 1998. Exhaustive scheduling and retiming of digital signal processing systems. IEEE Transactions on Circuits and Systems 45, 7, 821--838.Google Scholar
Cross Ref
- Niklas Eén and Niklas Sörensson. 2003. An extensible SAT-solver. In SAT, Lecture Notes in Computer Science, Enrico Giunchiglia and Armando Tacchella (Eds.), Vol. 2919. Springer, Berlin, 502--518.Google Scholar
- Federico Heras, Javier Larrosa, and Albert Oliveras. 2008. MiniMaxSat: An efficient weighted max-SAT solver. Journal of Artificial Intelligence Research 31, 1--32. Google Scholar
Cross Ref
- J. N. Hooker and Hong Yan. 1995. Logic circuit verification by Benders decomposition. Principles and Practice of Constraint Programming: The Newport Papers, 267C288.Google Scholar
- Edward A. Lee and David G. Messerschmitt. 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers 36, 1, 24--35. Google Scholar
Digital Library
- Weichen Liu, Zonghua Gu, Jiang Xu, Yu Wang, and Mingxuan Yuan. 2009. An efficient technique for analysis of minimal buffer requirements of synchronous dataflow graphs with model checking. In CODES’09: Proceedings of the 2009 International Conference on Hardware-Software Codesign and System Synthesis. Grenoble, France. Google Scholar
Digital Library
- Weichen Liu, Zonghua Gu, and Ye Yaoyao. 2015. Efficient SAT-based application mapping and scheduling on multiprocessor systems for throughput maximization. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). 127--136. DOI:http://dx.doi.org/10.1109/CASES.2015.7324553 Google Scholar
Digital Library
- Weichen Liu, Jiang Xu, Xiaowen Wu, Yaoyao Ye, Xuan Wang, Wei Zhang, M. Nikdast, and Zhehui Wang. 2011. A NoC traffic suite based on real applications. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI’11). 66--71. DOI:http://dx.doi.org/10.1109/ISVLSI.2011.49 Google Scholar
Digital Library
- Weichen Liu, Mingxuan Yuan, Xiuqiang He, Zonghua Gu, and Xue Liu. 2008. Efficient SAT-based mapping and scheduling of homogeneous synchronous dataflow graphs for throughput optimization. In Proceedings of the 2008 Real-Time Systems Symposium (RTSS’08). IEEE Computer Society, Washington, DC, 492--504. DOI:http://dx.doi.org/10.1109/RTSS.2008.49 Google Scholar
Digital Library
- N. Liveris, C. Lin, J. Wang, H. Zhou, and P. Banerjee. 2007. Retiming for synchronous data flow graphs. In Design Automation Conference (ASP-DAC’07). Asia and South Pacific. 480--485. DOI:http://dx.doi.org/10.1109/ASPDAC.2007.358032 Google Scholar
Digital Library
- Alexander Metzner and Christian Herde. 2006. RTSAT--An optimal and efficient approach to the task allocation problem in distributed architectures. In RTSS. IEEE Computer Society, Los Alamitos, CA, 147--158. Google Scholar
Digital Library
- Orlando Moreira, Twan Basten, Marc Geilen, and Sander Stuijk. 2010. Buffer sizing for rate-optimal single-rate data-flow scheduling revisited. IEEE Transactions on Computers 59, 188--201. DOI:http://dx.doi.org/10.1109/TC.2009.155 Google Scholar
Digital Library
- Orlando Moreira, Frederico Valente, and Marco Bekooij. 2007. Scheduling multiple independent hard-real-time jobs on a heterogeneous multiprocessor. In Proceedings of the 7th ACM and IEEE International Conference on Embedded Software (EMSOFT’07). ACM, New York, NY, 57--66. DOI:http://dx.doi.org/10.1145/1289927.1289941 Google Scholar
Digital Library
- Qi Ning and Guang R. Gao. 1993. A novel framework of register allocation for software pipelining. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’93). ACM, New York, NY, 29--42. DOI:http://dx.doi.org/10.1145/158511.158519 Google Scholar
Digital Library
- Object Management Group. MDA--The architecture of choice for a changing world. Retrieved July 1, 2016 from http://www.omg.org/mda.Google Scholar
- Keshab K. Parhi and David G. Messerschmitt. 1991. Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding. IEEE Transactions on Computers 40, 2, 178--195. Google Scholar
Digital Library
- Thomas M. Parks, Jose Luis Pino, and Edward A. Lee. 1995. A comparison of synchronous and cyclo-static dataflow. In Proceedings of Asilomar Conference on Signals, Systems and Computers (ACSSC’95). Google Scholar
Digital Library
- Nadathur Satish, Kaushik Ravindran, and Kurt Keutzer. 2007. A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors. In DATE, Rudy Lauwereins and Jan Madsen (Eds.). ACM, 57--62. Google Scholar
Digital Library
- Hossein M. Sheini and Karem A. Sakallah. 2006. From propositional satisfiability to satisfiability modulo theories. In SAT, Lecture Notes in Computer Science, Armin Biere and Carla P. Gomes (Eds.), Vol. 4121. Springer, Berlin, 1--9. Google Scholar
Digital Library
- S. Sriram and S. S. Bhattacharyya. 2000. Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker, New York, NY. Google Scholar
Digital Library
- Sander Stuijk. 2007. Predictable Mapping of Streaming Applications on Multiprocessors. Ph.D. Dissertation. Technical University of Eindhoven, Eindhoven, The Netherlands.Google Scholar
- Sander Stuijk, Twan Basten, Marc Geilen, and Henk Corporaal. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In DAC. 777--782. Google Scholar
Digital Library
- S. Stuijk, M. C. W. Geilen, and T. Basten. 2006. SDF3: SDF for free. In Proceedings of the 6th International Conference on Application of Concurrency to System Design (ACSD’06). IEEE Computer Society Press, Los Alamitos, CA, 276--278. DOI:http://dx.doi.org/10.1109/ACSD.2006.23 Google Scholar
Digital Library
- Hoeseok Yang and Soonhoi Ha. 2009. Pipelined data parallel task mapping/scheduling technique for MPSoC. In Design, Automation Test in Europe Conference Exhibition (DATE’09). 69--74. Google Scholar
Digital Library
- L. Yang, W. Liu, W. Jiang, M. Li, J. Yi, and E. H. M. Sha. 2016. Application mapping and scheduling for network-on-chip-based multiprocessor system-on-chip with fine-grain communication optimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems PP, 99, 1--14. DOI:http://dx.doi.org/10.1109/TVLSI.2016.2535359Google Scholar
- Lei Yang, Weichen Liu, Weiwen Jiang, Juan Yi, Duo Liu, and Qingfeng Zhuge. 2014. Contention-aware task and communication co-scheduling for network-on-chip based multiprocessor system-on-chip. In IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications. 1--8. DOI:http://dx.doi.org/10.1109/RTCSA.2014.6910553Google Scholar
- L. Yang, W. Liu, W. Jiang, W. Zhang, M. Li, J. Yi, D. Liu, and E. H. M. Sha. 2015. Traffic-aware application mapping for network-on-chip based multiprocessor system-on-chip. In IEEE 17th International Conference on High Performance Computing and Communications (HPCC’15), IEEE 7th International Symposium on Cyberspace Safety and Security (CSS’15), IEEE 12th International Conference on Embedded Software and Systems (ICESS’15). 571--576. DOI:http://dx.doi.org/10.1109/HPCC-CSS-ICESS.2015.60 Google Scholar
Digital Library
- Xue-Yang Zhu, T. Basten, M. Geilen, and S. Stuijk. 2012. Efficient retiming of multirate DSP algorithms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31, 6, 831--844. DOI:http://dx.doi.org/10.1109/TCAD.2011.2182352 Google Scholar
Digital Library
- Xue-Yang Zhu, M. Geilen, T. Basten, and S. Stuijk. 2014. Memory-constrained static rate-optimal scheduling of synchronous dataflow graphs via retiming. In Design, Automation and Test in Europe Conference and Exhibition (DATE’14). 1--6. DOI:http://dx.doi.org/10.7873/DATE.2014.338 Google Scholar
Digital Library
Index Terms
An Efficient Technique of Application Mapping and Scheduling on Real-Time Multiprocessor Systems for Throughput Optimization
Recommendations
Throughput maximization in multiprocessor speed-scaling
In the classical energy minimization problem, introduced in 24, we are given a set of n jobs each one characterized by its release date, its deadline, its processing volume and we aim to find a feasible schedule of the jobs on a single speed-scalable ...
Schedulability issues for EDZL scheduling on real-time multiprocessor systems
EDZL (Earliest Deadline first until Zero Laxity) is an efficient and practical scheduling algorithm on multiprocessor systems. It has a comparable number of context switch to EDF (Earliest Deadline First) and its schedulable utilization seems to be ...
Efficient Scheduling Algorithms for Real-Time Multiprocessor Systems
Efficient scheduling algorithms based on heuristic functions are developed for scheduling a set of tasks on a multiprocessor system. The tasks are characterized by worst-case computation times, deadlines, and resources requirements. Starting with an ...






Comments