Abstract
Some loops with cross-iteration dependences can execute in parallel by pipelining. The loop body is partitioned into stages such that the data dependences are not violated and then the stages are mapped onto threads. Two well-known mapping techniques are fixed code and fixed data; they achieve high performance for load-balanced loops, but they fail to perform well for load-imbalanced loops. In this article, we present a novel hybrid mapping that eliminates drawbacks of both prior mapping techniques and enables dynamic scheduling of stages.
- C. Bienia and K. Li. Characteristics of Workloads Using the Pipeline Programming Model. Revised selected papers from the 3rd Workshop on Emerging Applications and Many-Core Architecture, held in conjunction with ISCA '10, pages 161--171, 2012. Google Scholar
Digital Library
- M. Kamruzzaman, S. Swanson, and D. M. Tullsen. Load-balanced Pipeline Parallelism. Proc. SC '13, pages 1--12, 2013. Google Scholar
Digital Library
- A. Navarro, R. Asenjo, S. Tabik, and C. Cascaval. Analytical Modeling of Pipeline Parallelism. Proc. PACT '09, pages 281--290, 2009. Google Scholar
Digital Library
- E. Raman, G. Ottoni, A. Raman, M. J. Bridges, and D. I. August. Parallel-Stage Decoupled Software Pipelining. Proc. CGO '08, pages 114--123, 2008. Google Scholar
Digital Library
Index Terms
Unifying fixed code and fixed data mapping of load-imbalanced pipelined loops
Recommendations
Unifying fixed code and fixed data mapping of load-imbalanced pipelined loops
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingSome loops with cross-iteration dependences can execute in parallel by pipelining. The loop body is partitioned into stages such that the data dependences are not violated and then the stages are mapped onto threads. Two well-known mapping techniques ...
Load-balancing for load-imbalanced fine-grained linear pipelines
Highlights- A practical technique to achieve load-balancing for linear pipelines is presented.
AbstractPipelining is a well-known technique to overlap loop iterations by partitioning the loop body into a sequence of stages. A large class of programs can be expressed as linear pipelines if data dependences only flow from earlier to later ...
Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures
Nested loops represent a significant portion of application runtime in multimedia and DSP applications, an important domain of applications for coarse-grained reconfigurable architectures (CGRAs). While conventional approaches to mapping nested loops ...






Comments