Abstract
This paper describes Surge, a collection-oriented programming model that enables programmers to compose parallel computations using nested high-level data collections and operators. Surge exposes a code generation interface, decoupled from the core computation, that enables programmers and autotuners to easily generate multiple implementations of the same computation on various parallel architectures such as multi-core CPUs and GPUs. By decoupling computations from architecture-specific implementation, programmers can target multiple architectures more easily, and generate a search space that facilitates optimization and customization for specific architectures. We express in Surge four real-world benchmarks from domains such as sparse linear-algebra and machine learning and from the same performance-portable specification, generate OpenMP and CUDA C++ implementations. Surge generates efficient, scalable code which achieves up to 1.32x speedup over handcrafted, well-optimized CUDA code.
- N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In SC ’09: Proc. Conference on High Performance Computing Networking, Storage and Analysis, Nov. 2009. Google Scholar
Digital Library
- G. Blelloch. Nesl: A nested data-parallel language. Technical report, Carnegie Mellon University, 1992. Google Scholar
Digital Library
- B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: Compiling an embedded data parallel language. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP ’11, pages 47–56. ACM, 2011. Google Scholar
Digital Library
Index Terms
A collection-oriented programming model for performance portability
Recommendations
A collection-oriented programming model for performance portability
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingThis paper describes Surge, a collection-oriented programming model that enables programmers to compose parallel computations using nested high-level data collections and operators. Surge exposes a code generation interface, decoupled from the core ...
Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms
IWOCL'19: Proceedings of the International Workshop on OpenCLA key challenge in programming high-performance applications is achieving portable performance, such that the same program code can reach a consistent level of performance over the variety of modern parallel processors, including multi-core CPU and ...
Improving Performance of GPU Specific OpenCL Program on CPUs
PDCAT '12: Proceedings of the 2012 13th International Conference on Parallel and Distributed Computing, Applications and TechnologiesOpenCL provides unified programming interface for various parallel computing platforms. The OpenCL framework manifests good functional portability, the programs can be run on platforms supporting OpenCL programming without any modification. However, ...






Comments