Abstract
This work proposes a low-overhead half-barrier pattern to schedule fine-grain parallel loops and considers its integration in the Intel OpenMP and Cilkplus schedulers. Experimental evaluation demonstrates that the scheduling overhead of our techniques is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We observe 22% speedup on 48 threads, with a peak of 2.8x speedup.
- Y. He, C. E. Leiserson, and W. M. Leiserson. 2010. The Cilkview Scalability Analyzer. In SPAA '10. ACM, New York, NY, USA. Google Scholar
Digital Library
- J. M. Mellor-Crummey and M. L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9 (February 1991). Google Scholar
Digital Library
- P. K. Smolarkiewicz et al. 2016. A finite-volume module for simulating global all-scale atmospheric flows. J. Comput. Phys. 314 (2016). Google Scholar
Digital Library
- J. Talbot, R. M. Yoo, and C. Kozyrakis. 2011. Phoenix++: Modular MapReduce for Shared-memory Systems. In MapReduce '11. ACM, New York, NY, USA. Google Scholar
Digital Library
Recommendations
Reducing the burden of parallel loop schedulers for many-core processors
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingThis work proposes a low-overhead half-barrier pattern to schedule fine-grain parallel loops and considers its integration in the Intel OpenMP and Cilkplus schedulers. Experimental evaluation demonstrates that the scheduling overhead of our techniques ...
Benchmarking Parallel Performance on Many-Core Processors
OpenSHMEM 2014: Proceedings of the First Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools - Volume 8356With the emergence of many-core processor architectures onto the HPC scene, concerns arise regarding the performance and productivity of numerous existing parallel-programming tools, models, and languages. As these devices begin augmenting conventional ...
Parallel Monte Carlo Tree Search from Multi-core to Many-core Processors
TRUSTCOM-BIGDATASE-ISPA '15: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 03In recent years there has been much interest in the MCTS algorithm, a new, adaptive, randomized optimization algorithm. In fields as diverse as Artificial Intelligence, Operations Research, and High Energy Physics, research has established that MCTS can ...







Comments