Abstract
The Process Networks (PNs) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is a very difficult and highly error-prone task. To overcome the associated difficulties, we have developed the pn compiler, which derives specific Polyhedral Process Networks (PPN) parallel specifications from sequential static affine nested loop programs (SANLPs). However, there are many applications, for example, multimedia applications (MPEG coders/decoders, smart cameras, etc.) that have adaptive and dynamic behavior which cannot be expressed as SANLPs. Therefore, in order to handle dynamic multimedia applications, in this article we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way.
The main contribution of this article is a first approach for automated translation of affine nested loop programs with dynamic loop bounds into input-output equivalent Polyhedral Process Networks. In addition, we present a method for analyzing the execution overhead introduced in the PPNs derived from programs with dynamic loop bounds. The presented automated translation approach has been evaluated by deriving a PPN parallel specification from a real-life application called Low Speed Obstacle Detection (LSOD) used in the smart cameras domain. By executing the derived PPN, we have obtained results which indicate that the approach we present in this article facilitates efficient parallel implementations of sequential nested loop programs with dynamic loop bounds. That is, our approach reveals the possible parallelism available in such applications, which allows for the utilization of multiple cores in an efficient way.
- Arulampalam, S. and Maskell, S. 2002. A tutorial of partical filter for on-line non-linear/non-Gaussian Bayesian tracking. IEEE Trans. Sig. Process. 68--73. Google Scholar
Digital Library
- Benabderrahmane, M.-W., Pouchet, L.-N., Cohen, A., and Bastoul, C. 2010. The polyhedral model is more widely applicable than you think. In Proceedings of ETAPS CC'10. Google Scholar
Digital Library
- Castrillon, J., et al. 2010. Trace-based KPN composability analysis for mapping simultaneous applications to MPsoc platforms. In Proceedings of DATE'10. Google Scholar
Digital Library
- Collard, J.-F., Barthou, D., and Feautrier, P. 1995. Fuzzy array dataflow analysis. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, 92--101. Google Scholar
Digital Library
- de Kock, E. 2002. Multiprocessor mapping of process networks: A JPEG decoding case study. In Proceedings of the 15th International Symposium on System Synthesis (ISSS'02), 68--73. Google Scholar
Digital Library
- Dwivedi, B., et al. 2004. Automatic synthesis of system on chip multiprocessor architectures for process networks. In Proceedings of the CODES+ISSS. Google Scholar
Digital Library
- Farago, T. 2009. A framework for heterogeneous desktop parallel computing. M.S. thesis, LERC, LIACS.Google Scholar
- Feautrier, P. 1988. Parametric integer programming. RAIRO Recherche Opérationnelle 22, 3, 243--268.Google Scholar
Cross Ref
- Feautrier, P. 1991. Dataflow analysis of scalar and array references. Para. Prog. 20, 1, 23--53.Google Scholar
- Feautrier, P. 1996. Automatic parallelization in the polytope model. In The Data Parallel Programming Model. Lecture Notes in Computer Science, vol. 1132, 79--103. Google Scholar
Digital Library
- Geigl, M., Griebl, M., and Lengauer, C. 1999. Termination detection in parallel loop nests with while loops. Paral. Comput. 25, 12, 1489--1510. Google Scholar
Digital Library
- Goossens K., et al. 2003. Guaranteeing the quality of services in networks on chip. In Networks on Chip. Kluwer Publishers, 61--82. Google Scholar
Digital Library
- Griebl, M. and Lengauer, C. 1996. The loop parallelizer loopo. In Proceedings of the 6th Workshop on Compilers for Parallel Computers, vol. 21. Forschungszentrum, 311--320.Google Scholar
- Haid, W., et al. 2009. Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOS. In Proceedings of ESTIMedia. IEEE, 35--44.Google Scholar
Cross Ref
- Kahn, G. 1974. The Semantics of a simple language for parallel programming. In Proceedings of the IFIP Congress 74. North-Holland Publishing Co.Google Scholar
- Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in Parallelization. In Proceedings of the ACM Symposium on Principles of Programming Languages (PoPL). CA, 107--120. Google Scholar
Digital Library
- Martin, G. 2006. Overview of the MPSoC design challenge. In Proceedings of DAC. Google Scholar
Digital Library
- Mihal, A. and Keutzer, K. 2003. Mapping concurrent applications onto architectural platforms. In Networks on Chips, A. Jantsch and H. Tenhunen, Eds., Kluwer Academic Publishers, 39--59. Google Scholar
Digital Library
- Nadezhkin, D. and Stefanov, T. 2010. Identifying communication models in process networks derived from weakly dynamic programs. In Proceedings of SAMOS X. 372--379.Google Scholar
- Nikolov, H., Stefanov, T., and Deprettere, E. F. 2008. Systematic and automated multiprocessor system design, programming, and implementation. IEEE Trans. CAD 27, 3, 542--555. Google Scholar
Digital Library
- Raman, E., Ottoni, G., Raman, A., Bridges, M. J., and August, D. I. 2008. Parallel-stage decoupled software pipelining. In Proceedings of the 6th CGO, 114--123. Google Scholar
Digital Library
- Stefanov, T. 2004. Converting weakly dynamic programs to equivalent process network specifications. Ph.D. thesis. Leiden University, The Netherlands, ISBN: 90-9018629-8.Google Scholar
- Stefanov T., et al. 2004. System design using Kahn process networks: The Compaan/Laura approach. In Proceedings of DATE. 340--345. Google Scholar
Digital Library
- Turjan, A. 2007. Compiling nested loop programs to process networks. Ph.D. thesis. Leiden University, The Netherlands.Google Scholar
- Turjan, A., Kienhuis, B., and Deprettere, E. 2002. Realizations of the extended linearization model in the Compaan tool chain. In Proceedings of the 2nd Samos Workshop.Google Scholar
- Turjan, A., Kienhuis, B., and Deprettere, E. 2004. Translating affine nested-loop programs to process networks. In Proceedings of CASES'04, DC. Google Scholar
Digital Library
- Verdoolaege, S., Nikolov, H., and Stefanov, T. 2007. PN: A tool for improved derivation of process networks. EURASIP J. Embed. Syst. 2007, 1, 19--19. Google Scholar
Digital Library
Index Terms
Automated generation of polyhedral process networks from affine nested-loop programs with dynamic loop bounds
Recommendations
Timing optimization via nest-loop pipelining considering code size
Embedded systems have strict timing and code size requirements. Software pipelining is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. However, ...
Tiling imperfectly-nested loop nests
SC '00: Proceedings of the 2000 ACM/IEEE conference on SupercomputingTiling is one of the more important transformations for enhancing loca lity of reference in programs. Intuitively, tiling a set of loops achieves the effect of interleaving iterations of these loops. Tiling of perfectly-nested loop nests (which are loop ...
Joint affine transformation and loop pipelining for mapping nested loop on CGRAs
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & ExhibitionCoarse-Grained Reconfigurable Architectures (CGRAs) are the promising architectures with high performance, high power- efficiency and attractions of flexibility. The computation-intensive portions of application, i.e. loops, are often implemented on ...






Comments