Abstract
Dynamic parallelism (DP) is a new feature of emerging GPUs that allows new kernels to be generated and scheduled from the deviceside (GPU) without the host-side (CPU) intervention. To eiciently support DP, one of the major challenges is to saturate the GPU processing elements and provide them with the required data in a timely fashion. In this paper, we irst conduct a limit study on the performance improvements that can be achieved by hardware schedulers that are provided with accurate data reuse information. We next propose LASER, a Locality-Aware SchedulER, where the hardware schedulers employ data reuse monitors to help make scheduling decisions to improve data locality at runtime. Experimental results on 16 benchmarks show that LASER, on an average, can improve performance by 11.3%.
- Xulong Tang, Ashutosh Pattnaik, Huaipan Jiang, Onur Kayiran, Adwait Jog, Sreepathi Pai, Mohamed Ibrahim, Mahmut Kandemir, and Chita Das. 2017. Controlled Kernel Launch for Dynamic Parallelism in GPUs. In HPCA.Google Scholar
- JinWang, Norm Rubin, Albert Sidelnik, and Sudhakar Yalamanchili. 2016. LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs. In ISCA.Google Scholar
Recommendations
Quantifying Data Locality in Dynamic Parallelism in GPUs
GPUs are becoming prevalent in various domains of computing and are widely used for streaming (regular) applications. However, they are highly inefficient when executing irregular applications with unstructured inputs due to load imbalance. Dynamic ...
Quantifying Data Locality in Dynamic Parallelism in GPUs
SIGMETRICS '19: Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer SystemsDynamic parallelism (DP) is a new feature of emerging GPUs that allows new kernels to be generated and scheduled from the device-side (GPU) without the host-side (CPU) intervention. To efficiently support DP, one of the major challenges is to saturate ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...






Comments