Abstract
In this article, we partition and schedule Synchronous Dataflow (SDF) graphs onto heterogeneous execution architectures in such a way as to minimize energy consumption and maximize throughput. Partitioning and scheduling SDF graphs onto homogeneous architectures is a well-known NP-hard problem. The heterogeneity of the execution architecture makes our problem exponentially challenging to solve. We model the problem as a weighted sum and solve it using novel state space exploration inspired from the theory of parallel automata. The resultant heuristic algorithm results in good scheduling when implemented in an existing stream framework.
- H. Aydin and Q. Yang. 2003. Energy-aware partitioning for multiprocessor real-time systems. In Proceedings of the International Parallel and Distributed Parallel Processing Symposium IPDPS'03. IEEE Computer Society, Nice, France. 113. Google Scholar
Digital Library
- A. Benoit, P. Renaud-Goud, and Y. Robert. 2010. Sharing resources for performance and energy optimization of concurrent streaming applications. In Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'10). IEEE Computer Society, Washington, DC, 79--86. Google Scholar
Digital Library
- T. D. Braun, H. J. Siegel, N. Beck, L. L. Bölöni, M. Maheswaran, A. I. Reuther, J. P. Robertson, M. D. Theys, B. Yao, D. Hensgen, and R. F. Freund. 2001. A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61, 6, 810--837. Google Scholar
Digital Library
- J. Burch, E. Clarke, K. McMillan, D. Dill, and L. Hwang. 1990. Symbolic model checking: 1020 states and beyond. In Proceedings of the Annual Symposium on Logic in Computer Science (LICS'90). 428--439.Google Scholar
- P. M. Carpenter, A. Ramirez, and E. Ayguade. 2009. Mapping stream programs onto heterogeneous multiprocessor systems. In Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems(CASES'09). ACM, New York, NY, 57--66. Google Scholar
Digital Library
- E. M. Clarke, O. Grumberg, and D. Peled. 2000. Model Checking. MIT Press.Google Scholar
- W. Dally and B. Towles. 2001. Route packets, not wires: On-chip interconnection networks. In Proceedings of the 2001 Design Automation Conference. IEEE, Las Vegas, USA, 684--689. Google Scholar
Digital Library
- J. Donald and M. Martonosi. 2006. An efficient, practical parallelization methodology for multicore architecture simulation. IEEE Comput. Archit. Lett. 5, 14--. Google Scholar
Digital Library
- S. M. Farhad, Y. Ko, B. Burgstaller, and B. Scholz. 2011. Orchestration by approximation: Mapping stream programs onto multicore architectures. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'11). ACM, New York, NY, 357--368. Google Scholar
Digital Library
- M. R. Garey and D. S. Johnson. 1990. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY.Google Scholar
- M. I. Gordon, W. Thies, and S. Amarasinghe. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. SIGOPS Oper. Syst. Rev. 40, 151--162. Google Scholar
Digital Library
- M. Griebl, C. Lengauer, and S. Wetzel. 1998. Code generation in the polytope model. In IEEE PACT. IEEE Computer Society Press, 106--111. Google Scholar
Digital Library
- Z. Gu, M. Yuan, N. Guan, M. Lv, X. He, Q. Deng, and G. Yu. 2007. Static scheduling and software synthesis for dataflow graphs with symbolic model-checking. In RTSS'07. IEEE Computer Society, Washington, DC, 353--364. Google Scholar
Digital Library
- J. Hu and R. Marculescu. 2003. Energy-aware mapping for tile-based NoC architectures under performance constraints. In Proceedings of the 2003 Asia and South Pacific Design Automation Conference. ASP-DAC'03. IEEE Computer Society, New York, NY, USA, 233--239. Google Scholar
Digital Library
- J. Hu and R. Marculescu. 2004. Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In Proceedings of the Conference on Design, Automation and Test in Europe - Volume 1. DATE'04. IEEE Computer Society, 10234--. Google Scholar
Digital Library
- N. Jha. 2005. Low-power system scheduling, synthesis and displays. Computers and Digital Techniques, IEE Proceedings -152, 3, 344--352.Google Scholar
- S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science 220, 671--680.Google Scholar
Cross Ref
- M. Kudlur and S. Mahlke. 2008. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI'08. ACM, New York, NY, USA, 114--124. Google Scholar
Digital Library
- S. Kumar, A. Jantsch, J.-P. Soininen, M. Forsell, M. Millberg, J. Oberg, K. Tiensyrja, and A. Hemani. 2002. A network on chip architecture and design methodology. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2002. 105--112. Google Scholar
Digital Library
- E. Lee and D. Messerschmitt. 1987a. Synchronous data flow. Proceedings of the IEEE. Vol. 75, 9, 1235--1245.Google Scholar
Cross Ref
- E. A. Lee and D. G. Messerschmitt. 1987b. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36, 24--35. Google Scholar
Digital Library
- E. A. Lee and D. G. Messerschmitt. 1987c. Synchronous data flow. In Proceedings of the IEEE. Vol. 75, 1235--1245.Google Scholar
Cross Ref
- A. Malik and D. Gregg. 2012a. Executing Synchornous Data Flow Graphs on Heterogeneous Execution Architectures Using Integer Linear Programming. Technical report, Department of Computer Science and Statistics, Trinity College Dublin.Google Scholar
- A. Malik and D. Gregg. 2012b. Theorems for Model-Checking. Technical report, Department of Computer Science and Statistics, Trinity College Dublin.Google Scholar
- A. Malik and D. Gregg. 2013. Orchestrating stream graphs using model checking. TACO 10, 3, 19. Google Scholar
Digital Library
- C. Marcon, A. Borin, A. Susin, L. Carro, and F. Wagner. 2005. Time and energy efficient mapping of embedded applications onto nocs. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'05). IEEE Computer Society, Vol. 1. 33--38. Google Scholar
Digital Library
- T. N. Mudge. 2000. Power: A first class design constraint for future architecture and automation. In Proceedings of the 7th International Conference on High Performance Computing (HiPC'00). IEEE Computer Society, 215--224. Google Scholar
Digital Library
- S. Murali, M. Coenen, A. Radulescu, K. Goossens, and G. De Micheli. 2006. A methodology for mapping multiple use-cases onto networks on chips. In Proceedings of the Design, Automation and Test in Europe (DATE'06). IEEE Computer Society, Vol. 1. 1--6. Google Scholar
Digital Library
- A. Nagar, J. Haddock, and S. Heragu. 1995. Multiple and bicriteria scheduling: A literature survey. Eur. J. Oper. Res. 81, 1, 88--104.Google Scholar
Cross Ref
- V. Sarkar. 1989. Partitioning and scheduling parallel algorithms for execution on multiprocessors. Ph.D. thesis, Stanford University. Google Scholar
Digital Library
- N. Satish, K. Ravindran, and K. Keutzer. 2007. A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors. In Design, Automation Test in Europe Conference Exhibition (DATE'07). IEEE Computer Society, 1--6. Google Scholar
Digital Library
- G. C. Sih and E. A. Lee. 1993. Declustering: A new multiprocessor scheduling technique. IEEE Trans. Parallel Distrib. Syst. 4, 625--637. Google Scholar
Digital Library
- A. Singh, M. Shafique, A. Kumar, and J. Henkel. 2013. Mapping on multi/many-core systems: Survey of current and emerging trends. In Proceedings of the 50th ACM/EDAC/ IEEE Design Automation Conference (DAC), 2013. 1--10. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. 2002. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC'02). Springer, London, UK, 179--196. Google Scholar
Digital Library
- K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. 2009. Polyhedral-model guided loop-nest auto-vectorization. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT'09). ACM, 327--337. Google Scholar
Digital Library
- A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. 2009. Software pipelined execution of stream programs on GPUs. In Proceedings of the International Symposium of Code Generation and Optimization. IEEE Computer Society, 200--209. Google Scholar
Digital Library
- R. Xu, R. Melhem, and D. Moss. 2007. Energy-aware scheduling for streaming applications on chip multiprocessors. In Proceedings of the 28th IEEE Real-Time System Symposium (RTSS'07). Google Scholar
Digital Library
Index Terms
Heuristics on Reachability Trees for Bicriteria Scheduling of Stream Graphs on Heterogeneous Multiprocessor Architectures
Recommendations
Bicriteria scheduling problem for unrelated parallel machines with release dates
This research proposes a heuristic and a tabu search algorithm (TSA) to find non-dominated solutions to bicriteria unrelated parallel machine scheduling problems with release dates. The two objective functions considered in this problem are to minimize ...
Scheduling Multiprocessor Tasks to Minimize Schedule Length
The problem considered in this paper is the deterministic scheduling of tasks on a set of identical processors. However, the model presented differs from the classical one by the requirement that certain tasks need more than one processor at a time for ...
Scheduling of Multiprocessor Tasks for Numerical Applications
SPDP '96: Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)We investigate the efficient implementation of algorithms with a two-level parallelism on distributed memory machines. We consider parallel specifications consisting of an upper level of multi-processor tasks each of which having an internal structure ...






Comments