Abstract
Many parallel languages presume a shared address space in which any portion of a computation can access any datum. Some parallel computers directly support this abstraction with hardware shared memory. Other computers provide distinct (per-processor) address spaces and communication mechanisms on which software can construct a shared address space. Since programmers have difficulty explicitly managing address spaces, there is considerable interest in compiler support for shared address spaces on the widely available message-passing computers.
At first glance, it might appear that hardware-implemented shared memory is unquestionably a better base on which to implement a language. This paper argues, however, that compiler-implemented shared memory, despite its short-comings, has the potential to exploit more effectively the resources in a parallel computer. Hardware designers need to find mechanisms to combine the advantages of both approaches in a single system.
- ADVE, S. V. AND HILL, M.D. 1990. Weak ordering--a new definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture (May), 2-14. Google Scholar
- ADVE, S. V., ADVE, V. S., HILL, M. D., AND VERNON, M.K. 1991. Comparison of hardware and software cache coherence schemes. In Proceedings of the 18th Annual International Symposium on Computer Architecture (June), 298-308. Google Scholar
- AGARWAL, A., LIM, B.-H., KRANZ, D., AND KUBIATOWICZ, J. 1990. APRIL: A processor architecture for multiprocessing. In Proceedings of the 17th Annual International Symposium on Computer Architecture (June), 104-114. Google Scholar
- AGARWAL, A., SIMONI, R., HOROWITZ, M., AND HENNESSY, J. 1988. An evaluation of directory schemes for cache coherence. In Proceedings of the 15th Annual International Symposium on Computer Architecture, 280-289. Google Scholar
- ALVERSON, R., CALLAHAN, D., CUMMINGS, D., KOBLENZ, B., PORTERFIELD, A., AND SMITH, B. 1990. The Tera computer system. In Proceedings of the 1990 International Conference on Supercomputing (June), 1-6. Google Scholar
- BELL, C.G. 1985. Multis: A new class of multiprocessor computers. Science 228. Amer. Assn. for Advancement of Science, Washington, D.C., 462-466.Google Scholar
- BOZKUS, Z., CHOUDHARY, A., Fox, G., HAUPT, T., RANKA, S. AND Wv, M.-V. 1993. Compiling Fortran 90D/HPF for distributed memory MIMD computers. Tech. Rep. SCCS-444 (Mar.). Syracuse University Press, Syracuse, N.Y.Google Scholar
- CALLAHAN, D. AND KENNEDY, K. 1988. Compiling programs for distributed-memory multiprocessors. J. Supercomput. 2, 151-169.Google Scholar
- CHATTERJEE, S., GILBERT, J. R., SCHREIBER, R., AND TENG, S. H. 1993. Automatic array alignment in data-parallel programs. In Conference Record of~ the 20th Annual ACM Symposium on Principles of Programming Lansuages. (Jan.). ACM, New York, 16-28. Google Scholar
- CULLER, D. E., SAH, A., SCHAUSER, K. E., EICKEN, T. VON, AND WAWRZYNEK, J. 1991. Fine-grain parallelism with minimal hardware support: A compiler-controlled thread abstract machine. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV) (Apr.), 164-175. Google Scholar
- DALLY, W. J., CHIEN, A., FISKE, S., HORWAT, W., KEEN, J., LARIVEE, M., NUTH, R., WILLS, S., CARRICK, P., AND FLYER, G. 1989. The J-Machine: A fine-grain concurrent computer. In Proc. Information Processing 89. Elsevier North-Holland, Inc., New York.Google Scholar
- EGGERS, S. J. AND KATZ, R.H. 1988. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, 373-382. Google Scholar
- GANNON, D., JALBY, W. AND GALLIVAN, K. 1988. Strategies for cache and local memory management by global program transformation. J. Parallel Distrib. Comput. 5, 587-616. Google Scholar
- GERNDT, H.M. 1989. Automatic parallelization for distributed-memory multiprocessor systems. Ph.D. thesis, Rheinischen Friedrich-Wilhelms-Universit~t.Google Scholar
- GOODMAN, J. R., VERNON, M. K., AND WOEST, P.J. 1989. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III) (Apr.), 64-77. Google Scholar
- GUPTA, M., MIDKIFF, S., SCHONBERG, E., SWEENEY, P., WANG, K. Y., AND BURKE, M. 1993. Ptran II--A compiler for high-performance Fortran. In 4th Workshop on Compilers for Parallel Computers. Delft, Netherlands (Dec.).Google Scholar
- HALL, M. W., HIRANANDANI, S., KENNEDY, K., AND TSENG, C-W. 1992. Interprocedural compilation of Fortran D for MIMD distributed-memory machines, in Proceedings of Supercomputing 92 (Nov.), 522-534. Google Scholar
- HILL, M. D., AND SMITH, A. J. 1989. Evaluating associativity in CPU caches. IEEE Trans. Comput. C-38, 12 (Dec.), 1612-1630. Google Scholar
- HILL, M. D., LARUS, J. R., REINHARDT, S. K., AND WOOD, D. A. 1992. Cooperative shared memory: Software and hardware for scalable multiprocessors. ACM Trans. Comput. Syst. 11, 4 (Nov.). Google Scholar
- HILLIS, W. D. AND TUCKER, L.W. 1993. The CM-5 connection machine: A scalable supercomputer. Commun. ACM 36, 11 (Nov.), 31-40. Google Scholar
- HmANANDANI, S., KENNEDY, K., AND TSENG, C.-W. 1992. Compiling Fortran D for MIMD distributed-memory machines. Commun. ACM 35, 8 (Aug.), 66-80. Google Scholar
- Kendall Square Research. 1990. Kendall Square Research Technical Summary, Cambridge, Mass.Google Scholar
- KOELBEL, C., MEHROTRA, P., AND VAN ROSENDALE, J. 1990. Supporting shared data structures on distributed memory architectures. In 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP) (Mar.). ACM Press, New York, 177-186. Google Scholar
- KRANZ, D., JOHNSON, K., AGARWAL, A., KUBIATOWICZ, J., AND LIM, B.-H. 1993. Integrating message-passing and shared-memory: Early experience. In 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP) (May). ACM Press, New York, 54-63. Google Scholar
- LAM, M. S., ROTHBERG, E. E., AND WOLF, M.E. 1991. The cache performance and optimizations of blocked algorithms. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems ( ASPLOS IV) (Apr.), 63-74. Google Scholar
- LARUS, J. R., CHANDRA, S., AND WOOD, D.A. 1993. CICO: A shared-memory programming performance model. In Portability and Performance for Parallel Processors. John Wiley and Sons, New York. To be published.Google Scholar
- LENOSKI, D., LAUDON, J., GHARACHORLOO, K., WEBER, W.-D., GUPTA, A., HENNESSY, J., HOROWITZ, M., AND LAM, M. 1992. The Stanford DASH multiprocessor. IEEE Computer 25, 3 (Mar.), 63-79. Google Scholar
- LI, K. AND HUDAK, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov.), 321-359. Google Scholar
- LOVEMAN, D.B. 1993. High performance Fortran. IEEE Parallel Distrib. Tech. 1, 1 (Feb.), 25-42. Google Scholar
- MAYDAN, D. E., AMARASINGHE, S. P., AND LAM, M.S. 1993. Array data-flow analysis and its use in array privatization. In Conference Record of the 20th Annual ACM Symposium on Principles of Programming Languages (Jan.). ACM Press, New York, 2-15. Google Scholar
- PINGALI, K. AND ROGERS, A. 1990. Compiling for locality. In Proceedings of the 1990 International conference on parallel conference (vol II:software)(aug),II,142-146.Google Scholar
- RAMANUJAM, j. AND SADAYAPPAN, P. 1991. Compile-time technique for data distribution in distributed memory machines. IEEE Trans. Parallel Distrib. Syst. 2, 4 (Oct.), 472-482. Google Scholar
- REINHARDT, S. K., LARUS, J. R., AND WOOD, D.A. 1994. Typhoon and Tempest: User-level shared memory. In Proceedings of the 21st Annual International Symposium on Computer Architecture. To be published. Google Scholar
- RETTBERG, R. AND THOMAS, R. 1986. Contention is no obstacle to shared-memory multiprocessing. Commun. ACM 29, 12 (Dec.), 1202-1212. Google Scholar
- ROGERS, A. M. 1991. Compiling for locality of reference. Ph.D. thesis, Cornell University, Ithaca, N.Y. Google Scholar
- SALTZ, J., CROWLEY, J., MIRCHANDANE~, R., AND BERRYMAN, H. 1990. Run-time scheduling and execution of loops on message passing machines. J. Parallel Distrib. Comput. 8, 303-312. Google Scholar
- SINGH, J. P., JOE, T., GUPTA, A. AND HENNESSY, J. 1993. An empirical comparison of the Kendall Square Research KSR~I and Stanford DASH Multiprocessor. In Proceedings of Supercornputing 93 (Nov.). Google Scholar
- WOLF, M. E. AND LAM, M.S. 1991. A data locality optimizing algorithm. In Proceedings of the SIGPLAN "91 Conference on Programming Language Design and Implementation (June). ACM, New York, 30-44. Google Scholar
- WOOD, D. A., CHANDRA, S., FALSAFI, B., HILL, M. D., LARUS, J. R., LEBECK, A. R., LEWIS, J. C., MUKHERJEE, S. S., PALACHARLA, S., AND REINHARDT, S.K. 1993. Mechanisms for cooperative shared memory. In Proceedings of the 20th Annual International Symposium on Computer Architecture (May), 156-168. Google Scholar
- ZIMA, H. AND CHAPMAN, B. 1993. Compiling for distributed-memory systems. Proc. IEEE 81, 2 (Feb.), 264-287.Google Scholar
Index Terms
Compiling for shared-memory and message-passing computers
Recommendations
Integrating message-passing and shared-memory: early experience
PPOPP '93: Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programmingThis paper discusses some of the issues involved in implementing a shared-address space programming model on large-scale, distributed-memory multiprocessors. While such a programming model can be implemented on both shared-memory and message-passing ...
Cooperative shared memory: software and hardware for scalable multiprocessors
We believe the paucity of massively parallel, shared-memory machines follows from the lack of a shared-memory programming performance model that can inform programmers of the cost of operations (so they can avoid expensive ones) and can tell hardware ...
Shared memory vs. message passing in shared-memory multiprocessors
SPDP '92: Proceedings of the 1992 Fourth IEEE Symposium on Parallel and Distributed ProcessingIt is argued that the choice between the shared-memory and message-passing models depends on two factors: the relative cost of communication and computation as implemented by the hardware, and the degree of load imbalance inherent in the application. ...






Comments