Abstract
Streaming applications often require a parallel Model of Computation (MoC) to specify their application behavior and to facilitate mapping onto Multi-Processor System-on-Chip (MPSoC) platforms. Various performance requirements and resource budgets of embedded systems ask for an efficient design space exploration (DSE) approach to select the best design from a design space consisting of a large number of design choices. However, existing DSE approaches explore the design space that includes only architecture and mapping alternatives for an initial application specification given by the application designer. In this article, we first show that a design often might not be optimal if alternative specifications of a given application are not taken into account. We further argue that the best alternative specification consists of only independent and load-balanced application tasks. Based on the Polyhedral Process Network (PPN) MoC, we present an approach to analyze and transform an initial PPN to an alternative one that contains only independent processes if possible. Finally, by prototyping real-life applications on both FPGA-based MPSoCs and desktop multi-core platforms, we demonstrate that mapping the alternative application specification results in a large performance gain compared to those approaches, in which alternative application specifications are not taken into account.
- Cong, J., Gururaj, K., Han, G., and Jiang, W. 2009. Synthesis algorithm for application-specific homogeneous processor networks. IEEE Trans. VLSI. Syst. 17, 1318--1329. Google Scholar
Digital Library
- Feautrier, P. 1991. Dataflow analysis of array and scalar references. Int. J. Parallel Program 20, 1, 23--53. Google Scholar
Cross Ref
- Feautrier, P. 1996. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, Springer-Verlag, 79--103. Google Scholar
Digital Library
- Gerstlauer, A., Haubelt, C., Pimentel, A., Stefanov, T., Gajski, D., and Teich, J. 2009. Electronic system-level synthesis methodologies. IEEE. Trans. Comput. Aid. Des. Integrat. Circuits. Syst. 28, 10, 1517--1530. Google Scholar
Digital Library
- Gerstlauer, A., Peng, J., Shin, D., Gajski, D., Nakamura, A., Araki, D., and Nishihara, Y. 2008. Specify-explore-refine (SER): from specification to implementation. In Proceedings of the 45th Annual Design Automation Conference. ACM, New York, NY, 586--591. Google Scholar
Digital Library
- Gordon, M. I., Thies, W., and Amarasinghe, S. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of The 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 151--162. Google Scholar
Digital Library
- Grosser, T., Zheng, H., A, R., Simbürger, A., Grösslinger, A., and Pouchet, L.-N. 2011. Polly - polyhedral optimization in LLVM. In Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques.Google Scholar
- Kahn, G. 1974. The semantics of a simple language for parallel programming. In Proceedings of the Information Processing. North-Holland.Google Scholar
- Kelly, W., Pugh, W., Rosser, E., and Shpeisman, T. 1996. Transitive closure of infinite graphs and its applications. Int. J. Parallel Program. 24, 579--598. Google Scholar
Digital Library
- Kudlur, M. and Mahlke, S. 2008. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 114--124. Google Scholar
Digital Library
- Lee, E. A. and Messerschmitt, D. G. 1987. Synchronous data flow. Proc. IEEE 75, 9, 1235--1245. Google Scholar
Cross Ref
- Liao, S.-w., Du, Z., Wu, G., and Lueh, G.-Y. 2006. Data and computation transformations for brook streaming applications on multiprocessors. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, Los Alamitos, CA, 196--207. Google Scholar
Digital Library
- Meijer, S., Nikolov, H., and Stefanov, T. 2010. Combining process splitting and merging transformations for polyhedral process networks. In Proceedings of the 8th International IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia'10). 97--106. Google Scholar
Cross Ref
- Nikolov, H., Stefanov, T., and Deprettere, E. 2008. Systematic and automated multiprocessor system design, programming, and implementation. IEEE Trans. Comput. Aid. Des. Integrat. Circuits Syst. 27, 542--555. Google Scholar
Digital Library
- Pimentel, A. D., Erbas, C., and Polstra, S. 2006. A systematic approach to exploring embedded system architectures at multiple abstraction levels. IEEE Trans. Comput. 55, 99--112. Google Scholar
Digital Library
- Polybench. 2012. The Polyhedral Benchmark Suite. http://www.cse.ohio-state.edu/pouchet/software/polybench/.Google Scholar
- Pugh, W. and Rosser, E. 1997. Iteration space slicing and its application to communication optimization. In Proceedings of the 11th International Conference on Supercomputing. ACM, New York, NY, 221--228. Google Scholar
Digital Library
- Stuijk, S., Basten, T., Geilen, M. C. W., and Corporaal, H. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In Proceedings of the 44th Annual Design Automation Conference. ACM, New York, NY, 777--782. Google Scholar
Digital Library
- Thiele, L., Bacivarov, I., Haid, W., and Huang, K. 2007. Mapping applications to tiled multiprocessor embedded systems. In Proceedings of the 7th International Conference on Application of Concurrency to System Design (ACSD). IEEE Computer Society, 29--40. Google Scholar
Digital Library
- Verdoolaege, S. 2010. isl: An integer set library for the polyhedral model. Lecture Notes in Computer Science, K. Fukuda, J. van der Hoeven, M. Joswig, and N. Takayama, Eds., Springer, 299--302. Google Scholar
Digital Library
- Verdoolaege, S., Catthoor, F., Bruynooghe, M., and Janssens, G. 2003. Multi-dimensional incremental loop fusion for data locality. In Proceedings of the Conference on Application-Specific Systems, Architectures and Processors. IEEE, 17--27. Google Scholar
Cross Ref
- Verdoolaege, S., Nikolov, H., and Stefanov, T. 2007a. pn: a tool for improved derivation of process networks. EURASIP J. Embed. Syst. 2007, 13. Google Scholar
Digital Library
- Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., and Bruynooghe, M. 2007b. Counting integer points in parametric polytopes using barvinok's rational functions. Algorithmica 48, 37--66. Google Scholar
Digital Library
- Yang, H. and Ha, S. 2009. Pipelined data parallel task mapping/scheduling technique for MPSoC. In Proceedings of 12th International Conference on Design, Automation and Test in Europe. 69--74. Google Scholar
Digital Library
- Zhu, J., Sander, I., and Jantsch, A. 2010. Constrained global scheduling of streaming applications on MPSoCs. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE Press, 223--228. Google Scholar
Digital Library
Index Terms
Mapping of streaming applications considering alternative application specifications
Recommendations
Multi-core architectures and streaming applications
SLIP '08: Proceedings of the 2008 international workshop on System level interconnect predictionIn this paper we focus on algorithms and reconfigurable multi-core architectures for streaming digital signal processing (DSP) applications. The multi-core concept has a number of advantages: (1) depending on the requirements more or fewer cores can be ...
Mapping Parallel Application Communication Topology to Rhombic Overlapping-Cluster Multiprocessors
This paper extends research into rhombic overlapping-connectivity interconnection networks into the area of parallel applications. As a foundation for a shared-memory non-uniform access bus-based multiprocessor, these interconnection networks create ...
Performance Prediction of Parallel Computation of Streaming Applications on FPGA Platform
UKSIM '10: Proceedings of the 2010 12th International Conference on Computer Modelling and SimulationThis paper analyzes possible performance improvement of streaming applications by the parallel computation platform of FPGAs. Software developers still are not familiar with the hardware implementation details of applications and will benefit from this ...






Comments