Abstract
In distributed computing, parallel overheads such as \emph{synchronization overhead} may hinder performance. We introduce the idea of \emph{Distributed Control} (DC) where global synchronization is reduced to \emph{termination detection} and each worker proceeds ahead optimistically, based on the local knowledge of the global computation. To avoid "wasted'' work, \DC relies on local work prioritization. However, the work order obtained by local prioritization is susceptible to interference from the runtime. We show that employing effective scheduling policies and optimizations in the runtime, in conjunction with eliminating global barriers, improves performance in two graph applications: single-source shortest paths and connected components.
- http://hpx.crest.iu.edu/. Accessed: 2016-05--25.Google Scholar
- Graph500. http://www.graph500.org/, Aug. 2016. Accessed: 2016-05--31.Google Scholar
- J. J. Willcock, T. Hoefler, N. G. Edmonds, and A. Lumsdaine.ampp: A Generalized Active Message Framework. In phProce. 19th Int. Conf. on Parallel Architectures and Compilation Techniques, pages 401--410. ACM, 2010. Google Scholar
Digital Library
- M. Zalewski, T. A. Kanewala, J. S. Firoz, and A. Lumsdaine. Distributed Control: Priority Scheduling for Single Source Shortest Paths Without Synchronization. In phProc. of the Fourth Workshop on Irregular Applications: Architectures and Algorithms, pages 17--24. IEEE, 2014. Google Scholar
Digital Library
Index Terms
POSTER: Distributed Control: The Benefits of Eliminating Global Synchronization via Effective Scheduling
Recommendations
POSTER: Distributed Control: The Benefits of Eliminating Global Synchronization via Effective Scheduling
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingIn distributed computing, parallel overheads such as \emph{synchronization overhead} may hinder performance. We introduce the idea of \emph{Distributed Control} (DC) where global synchronization is reduced to \emph{termination detection} and each worker ...
Fast Connected Components Algorithms for the EREW PRAM
We present fast and efficient parallel algorithms for finding the connected components of an undirected graph. These algorithms run on the exclusive-read, exclusive-write (EREW) PRAM. On a graph with n vertices and m edges, our randomized algorithm ...
Connected Components on a PRAM in Log Diameter Time
SPAA '20: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and ArchitecturesWe present an O(log d + log logm/n n)-time randomized PRAM algorithm for computing the connected components of an n-vertex, m-edge undirected graph with maximum component diameter d. The algorithm runs on an ARBITRARY CRCW (concurrent-read, concurrent-...







Comments