Abstract
Memory performance is one essential factor for tapping into the full potential of the massive parallelism of GPU. It has motivated some recent efforts in GPU cache modeling. This paper presents a new data-centric way to model the performance of a system with heterogeneous memory resources. The new model is composable, meaning it can predict the performance difference due to placing data differently by profiling the execution just once.
- G. Chen, B. Wu, D. Li, and X. Shen. PORPLE: An extensible optimizer for portable data placement on GPU. In Proceedings of MICRO, 2014. Google Scholar
Digital Library
- P. J. Denning. The working set model for program behaviour. Communications of the ACM, 11(5):323--333, 1968. Google Scholar
Digital Library
- C. Ding and T. Chilimbi. All-window profiling of concurrent executions. In Proceedings of PPoPP, 2008. Poster paper. Google Scholar
Digital Library
- X. Xiang, B. Bao, T. Bai, C. Ding, and T. M. Chilimbi. All-window profiling and composable models of cache sharing. In Proceedings of PPoPP, pages 91--102, 2011. Google Scholar
Digital Library
- X. Xiang, C. Ding, H. Luo, and B. Bao. HOTL: a higher order theory of locality. In Proceedings of ASPLOS, pages 343--356, 2013. Google Scholar
Digital Library
Index Terms
Data-centric combinatorial optimization of parallel code
Recommendations
Data-centric combinatorial optimization of parallel code
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingMemory performance is one essential factor for tapping into the full potential of the massive parallelism of GPU. It has motivated some recent efforts in GPU cache modeling. This paper presents a new data-centric way to model the performance of a system ...
Uniform lease vs. LRU cache: analysis and evaluation
ISMM 2021: Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory ManagementLease caching is a new technique that provides greater control of the cache than what is allowed in conventional caches. The simplest control is uniform lease (UL), which means that all leases are identical in length. The UL cache is prescriptive and ...
Nuclear Fusion Simulation Code Optimization on GPU Clusters
ICPADS '13: Proceedings of the 2013 International Conference on Parallel and Distributed SystemsGT5D is a nuclear fusion simulation program which aims to analyze the turbulence phenomena in tokamak plasma. In this research, we optimize it for GPU clusters with multiple GPUs on a node. Based on the profile result of GT5D on a CPU node, we decide to ...






Comments