ABSTRACT
We present a methodology for synthesizing streaming applications, modeled as task graphs, for pipelined execution on multi-core architectures. We develop a task graph extraction and characterization framework that accurately determines the structure, computation and communication characteristics of application task graph from its specification in C. Furthermore, we develop a provably optimal algorithm that jointly balances the workload assigned to each core, and minimizes inter-core communication traffic. Experiment results show that our versatile method improves the through-put of streaming applications significantly under a variety of hardware configurations.
- C. Lee, M. Potkonjak, and W.H. Mangione-Smith. "MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems". In International Symposium on Microarchitecture, pages 34--41, 1997. Google Scholar
Digital Library
- G. Contreras, M. Martonosi, J. Peng, R. Ju, G.Y. Lueh. "XTREM: a power simulator for the Intel XScale core". ACM SIGPLAN Notices, 39(7):115--125, July 2004. Google Scholar
Digital Library
- Man-Lap Li, Ruchira Sasanka, Sarita V. Adve, Yen-Kuang Chen, Eric Debes. "The ALPBench Benchmark Suite for Complex Multimedia Applications". In Proceedings of the IEEE International Symposium on Workload Characterization, 2005.Google Scholar
Index Terms
Joint throughput and energy optimization for pipelined execution of embedded streaming applications
Recommendations
Joint throughput and energy optimization for pipelined execution of embedded streaming applications
Proceedings of the 2007 LCTES conferenceWe present a methodology for synthesizing streaming applications, modeled as task graphs, for pipelined execution on multi-core architectures. We develop a task graph extraction and characterization framework that accurately determines the structure, ...
High Throughput, Pipelined Implementation of AES on FPGA
IEEC '09: Proceedings of the 2009 International Symposium on Information Engineering and Electronic CommerceThe FPGA-based high throughput 128 bits AES cipher processor is proposed in this paper. We present an equivalent pipelined AES architecture working on CTR mode to provide the highest throughput up to date through inserting some registers in appropriate ...
Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer
In this article, we present some key techniques for optimizing HPCG on Sunway TaihuLight and demonstrate how to achieve high performance in memory-bound applications by exploiting specific characteristics of the hardware architecture. In particular, we ...







Comments