ABSTRACT
Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increase significantly with number of hops traveled. Yet, we, and others have recently shown that in presence of contention, message latencies can grow substantially large. Hence task mapping strategies should take the topology of the machine into account on large machines. This poster presents a uniform API which provides topology information on 3D tori like IBM Blue Gene and Cray XT machines. We present techniques to use this API to improve performance. The API can be used by user-level codes to obtain information about allocated partitions at runtime which is essential for mapping.
We motivate why it is important to consider network topology, using a simple 3D Stencil kernel. We then present mapping strategies for a production code, OpenAtom, running on three-dimensional torus and mesh topologies. OpenAtom presents complex communication scenarios of interaction between multiple groups of objects. Results are presented in the context of 3D Stencil and OpenAtom on up to 16,384 processors of Blue Gene/L, 8,192 processors of Blue Gene/P and 2,048 processors of Cray XT3.
- P. A, H. MS, and C. R. Interface structure between silicon and its oxide by first-principles molecular dynamics. Nature, 396:58, 1998.Google Scholar
Cross Ref
- R. I. Greenberg and H.-C. Oh. Universal wormhole routing. IEEE Transactions on Parallel and Distributed Systems, 08(3):254--262, 1997. Google Scholar
Digital Library
- L. Kalé and S. Krishnan. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In A. Paepcke, editor, Proceedings of OOPSLA'93, pages 91--108. ACM Press, September 1993. Google Scholar
Digital Library
- D. S. L and C. P. Serine proteases: An ab initio molecular dynamics study. Proteins, 37:611, 1999.Google Scholar
Cross Ref
- L. M. Ni and P. K. McKinley. A survey of wormhole routing techniques in direct networks. Computer, 26(2):62--76, 1993. Google Scholar
Digital Library
Index Terms
Topology aware task mapping techniques: an api and case study
Recommendations
Topology aware task mapping techniques: an api and case study
PPoPP '09Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increase significantly with number of hops traveled. Yet, we, and others have ...
Triplet-based topology for on-chip networks
Most CMPs use on-chip network to connect cores and tend to integrate more simple cores on a single die. As the number of cores increases, on-chip network will play an important role in the performance of future CMPs. Due to the tradeoff between the ...
Flattened Butterfly Topology for On-Chip Networks
With the trend towards increasing number of cores in a multicore processors, the on-chip network that connects the cores needs to scale efficiently. In this work, we propose the use of high-radix networks in on-chip networks and describe how the ...







Comments