skip to main content
research-article

An Efficient Technique of Application Mapping and Scheduling on Real-Time Multiprocessor Systems for Throughput Optimization

Authors Info & Claims
Published:02 August 2016Publication History
Skip Abstract Section

Abstract

Multiprocessor systems are becoming ubiquitous in today’s embedded systems design. In this article, we address the problem of mapping an application represented by a Homogeneous Synchronous Dataflow (HSDF) graph onto a real-time multiprocessor platform with the objective of maximizing total throughput. We propose that the optimal solution to the problem is composed of three components: actor-to-processor mapping, retiming, and actor ordering on each processor. The entire problem is systematically modeled into a Boolean Satisfiability (SAT) problem such that the optimal solution can be guaranteed theoretically. In order to explore the vast solution space more efficiently, we develop a specific HSDF theory solver based on the special characteristics of the timed HSDF, and integrate it into the general search framework of the SAT solver. Two alternative integration methods based on branch-and-bound are presented to achieve early branch pruning in the search space; thus, the scalability is greatly improved. Extensive performance evaluation on synthetic examples and a case study on the realistic H.264 Video Decoder show that our approach provides as much as 76.9% throughput improvement, and is scalable to industry-sized applications.

References

  1. Mauricio Alvarez, Arnaldo Azevedo, Alex Ramrez, Cor Meenderinck, Mateo Valero, and Ben Juurlink. 2009. Performance evaluation of macroblock-level parallelization of H.264 decoding on a CC-NUMA multiprocessor architecture. In Proceedings of the 4th Colombian Computing Conference (4CCC’09).Google ScholarGoogle Scholar
  2. A. Bonfietti, L. Benini, M. Lombardi, and M. Milano. 2010. An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms. In Design, Automation Test in Europe Conference Exhibition (DATE’10). 897--902. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jason Cong, Guoling Han, and Wei Jiang. 2007. Synthesis of an application-specific soft multiprocessor system. In Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA’07), André DeHon and Mike Hutton (Eds.). ACM/SIGDA, New York, NY, Monterey, CA, 99--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ali Dasdan and Rajesh K. Gupta. 1998. Faster maximum and minimum mean cycle algorithms for system-performance analysis. IEEE Transactions on CAD of Integrated Circuits and Systems 17, 10, 889--899. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tracy C. Denk and Keshab K. Parhi. 1998. Exhaustive scheduling and retiming of digital signal processing systems. IEEE Transactions on Circuits and Systems 45, 7, 821--838.Google ScholarGoogle ScholarCross RefCross Ref
  6. Niklas Eén and Niklas Sörensson. 2003. An extensible SAT-solver. In SAT, Lecture Notes in Computer Science, Enrico Giunchiglia and Armando Tacchella (Eds.), Vol. 2919. Springer, Berlin, 502--518.Google ScholarGoogle Scholar
  7. Federico Heras, Javier Larrosa, and Albert Oliveras. 2008. MiniMaxSat: An efficient weighted max-SAT solver. Journal of Artificial Intelligence Research 31, 1--32. Google ScholarGoogle ScholarCross RefCross Ref
  8. J. N. Hooker and Hong Yan. 1995. Logic circuit verification by Benders decomposition. Principles and Practice of Constraint Programming: The Newport Papers, 267C288.Google ScholarGoogle Scholar
  9. Edward A. Lee and David G. Messerschmitt. 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers 36, 1, 24--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Weichen Liu, Zonghua Gu, Jiang Xu, Yu Wang, and Mingxuan Yuan. 2009. An efficient technique for analysis of minimal buffer requirements of synchronous dataflow graphs with model checking. In CODES’09: Proceedings of the 2009 International Conference on Hardware-Software Codesign and System Synthesis. Grenoble, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Weichen Liu, Zonghua Gu, and Ye Yaoyao. 2015. Efficient SAT-based application mapping and scheduling on multiprocessor systems for throughput maximization. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). 127--136. DOI:http://dx.doi.org/10.1109/CASES.2015.7324553 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Weichen Liu, Jiang Xu, Xiaowen Wu, Yaoyao Ye, Xuan Wang, Wei Zhang, M. Nikdast, and Zhehui Wang. 2011. A NoC traffic suite based on real applications. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI’11). 66--71. DOI:http://dx.doi.org/10.1109/ISVLSI.2011.49 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Weichen Liu, Mingxuan Yuan, Xiuqiang He, Zonghua Gu, and Xue Liu. 2008. Efficient SAT-based mapping and scheduling of homogeneous synchronous dataflow graphs for throughput optimization. In Proceedings of the 2008 Real-Time Systems Symposium (RTSS’08). IEEE Computer Society, Washington, DC, 492--504. DOI:http://dx.doi.org/10.1109/RTSS.2008.49 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Liveris, C. Lin, J. Wang, H. Zhou, and P. Banerjee. 2007. Retiming for synchronous data flow graphs. In Design Automation Conference (ASP-DAC’07). Asia and South Pacific. 480--485. DOI:http://dx.doi.org/10.1109/ASPDAC.2007.358032 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alexander Metzner and Christian Herde. 2006. RTSAT--An optimal and efficient approach to the task allocation problem in distributed architectures. In RTSS. IEEE Computer Society, Los Alamitos, CA, 147--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Orlando Moreira, Twan Basten, Marc Geilen, and Sander Stuijk. 2010. Buffer sizing for rate-optimal single-rate data-flow scheduling revisited. IEEE Transactions on Computers 59, 188--201. DOI:http://dx.doi.org/10.1109/TC.2009.155 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Orlando Moreira, Frederico Valente, and Marco Bekooij. 2007. Scheduling multiple independent hard-real-time jobs on a heterogeneous multiprocessor. In Proceedings of the 7th ACM and IEEE International Conference on Embedded Software (EMSOFT’07). ACM, New York, NY, 57--66. DOI:http://dx.doi.org/10.1145/1289927.1289941 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Qi Ning and Guang R. Gao. 1993. A novel framework of register allocation for software pipelining. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’93). ACM, New York, NY, 29--42. DOI:http://dx.doi.org/10.1145/158511.158519 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Object Management Group. MDA--The architecture of choice for a changing world. Retrieved July 1, 2016 from http://www.omg.org/mda.Google ScholarGoogle Scholar
  20. Keshab K. Parhi and David G. Messerschmitt. 1991. Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding. IEEE Transactions on Computers 40, 2, 178--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Thomas M. Parks, Jose Luis Pino, and Edward A. Lee. 1995. A comparison of synchronous and cyclo-static dataflow. In Proceedings of Asilomar Conference on Signals, Systems and Computers (ACSSC’95). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nadathur Satish, Kaushik Ravindran, and Kurt Keutzer. 2007. A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors. In DATE, Rudy Lauwereins and Jan Madsen (Eds.). ACM, 57--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hossein M. Sheini and Karem A. Sakallah. 2006. From propositional satisfiability to satisfiability modulo theories. In SAT, Lecture Notes in Computer Science, Armin Biere and Carla P. Gomes (Eds.), Vol. 4121. Springer, Berlin, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Sriram and S. S. Bhattacharyya. 2000. Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sander Stuijk. 2007. Predictable Mapping of Streaming Applications on Multiprocessors. Ph.D. Dissertation. Technical University of Eindhoven, Eindhoven, The Netherlands.Google ScholarGoogle Scholar
  26. Sander Stuijk, Twan Basten, Marc Geilen, and Henk Corporaal. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In DAC. 777--782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Stuijk, M. C. W. Geilen, and T. Basten. 2006. SDF3: SDF for free. In Proceedings of the 6th International Conference on Application of Concurrency to System Design (ACSD’06). IEEE Computer Society Press, Los Alamitos, CA, 276--278. DOI:http://dx.doi.org/10.1109/ACSD.2006.23 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hoeseok Yang and Soonhoi Ha. 2009. Pipelined data parallel task mapping/scheduling technique for MPSoC. In Design, Automation Test in Europe Conference Exhibition (DATE’09). 69--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Yang, W. Liu, W. Jiang, M. Li, J. Yi, and E. H. M. Sha. 2016. Application mapping and scheduling for network-on-chip-based multiprocessor system-on-chip with fine-grain communication optimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems PP, 99, 1--14. DOI:http://dx.doi.org/10.1109/TVLSI.2016.2535359Google ScholarGoogle Scholar
  30. Lei Yang, Weichen Liu, Weiwen Jiang, Juan Yi, Duo Liu, and Qingfeng Zhuge. 2014. Contention-aware task and communication co-scheduling for network-on-chip based multiprocessor system-on-chip. In IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications. 1--8. DOI:http://dx.doi.org/10.1109/RTCSA.2014.6910553Google ScholarGoogle Scholar
  31. L. Yang, W. Liu, W. Jiang, W. Zhang, M. Li, J. Yi, D. Liu, and E. H. M. Sha. 2015. Traffic-aware application mapping for network-on-chip based multiprocessor system-on-chip. In IEEE 17th International Conference on High Performance Computing and Communications (HPCC’15), IEEE 7th International Symposium on Cyberspace Safety and Security (CSS’15), IEEE 12th International Conference on Embedded Software and Systems (ICESS’15). 571--576. DOI:http://dx.doi.org/10.1109/HPCC-CSS-ICESS.2015.60 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Xue-Yang Zhu, T. Basten, M. Geilen, and S. Stuijk. 2012. Efficient retiming of multirate DSP algorithms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31, 6, 831--844. DOI:http://dx.doi.org/10.1109/TCAD.2011.2182352 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xue-Yang Zhu, M. Geilen, T. Basten, and S. Stuijk. 2014. Memory-constrained static rate-optimal scheduling of synchronous dataflow graphs via retiming. In Design, Automation and Test in Europe Conference and Exhibition (DATE’14). 1--6. DOI:http://dx.doi.org/10.7873/DATE.2014.338 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Efficient Technique of Application Mapping and Scheduling on Real-Time Multiprocessor Systems for Throughput Optimization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!