skip to main content
10.5555/110382acmconferencesBook PagePublication PagesscConference Proceedingsconference-collections
Supercomputing '90: Proceedings of the 1990 ACM/IEEE conference on Supercomputing
1990 Proceeding
Publisher:
  • IEEE Computer Society Press
  • Washington
  • DC
  • United States
Conference:
SC '90: International Conference for High Performance Computing, Networking, Storage and Analysis New York New York USA November 12 - 16, 1990
ISBN:
978-0-89791-412-3
Published:
12 November 1990
Sponsors:
SIGARCH, IEEE-CS

Bibliometrics
Article
Free
LAPACK: a portable linear algebra library for high-performance computers
pp 2–11

The goal of the LAPACK project is to design and implement a portable linear algebra library for efficient use on a variety of high-performance computers. The library is based on the widely used LINPACK and EISPACK packages for solving linear equations, ...

Article
Free
Hierarchical blocking and data flow analysis for numerical linear algebra
pp 12–19

The optimization of BLAS2 and BLAS3 for linear algebra on computers with hierarchical memory systems is discussed. A new blocking strategy called hierarchical blocking and data flow analysis is proposed and its applications are given. Numerical results ...

Article
Free
Multilinear algebra and parallel programming
pp 20–31

We report on preliminary results of a joint project of the Center for Large Scale Computation at the City University of New York and the Department of Computer and Information Sciences at The Ohio State University to study the use of multilinear algebra ...

Article
Free
The impact of memory organization on the performance of matrix multiplication
pp 34–40

Matrix multiplication may be considered as a model problem for analyzing the performance of more complex algorithms. On CRAY and IBM computer systems, there are library routines which for this task operate at high megaflop rates. Other programs from ...

Article
Free
A linear array of processors with partially shared memory for parallel solution of PDE
pp 41–48

We propose a multiprocessor architecture with partially shared memory blocks, which is, we think, best suited for the successive approximation of scientific computing problems, such as matrix operations, partial differential equations etc. The topology ...

Article
Free
On randomly interleaved memories
pp 49–58

Memory address interleaving, where an address k generated by a processor is mapped into the memory bank k (mod m), is a basic technique for increasing memory bandwidth. However, the access conflicts that can occur in interleaved memories sometimes ...

Article
Free
Tracing application program execution on the Cray X-MP and Cray 2
pp 60–73

Important insights into program operation can be gained by observing dynamic execution behavior. Unfortunately, many high-performance machines provide execution profile summaries as the only tool for performance investigation. We have developed a ...

Article
Free
Parallel program debugging with on-the-fly anomaly detection
pp 74–81

We describe an approach for parallel debugging that coordinates static analysis with efficient on-the-fly access anomaly detection. We are developing on-the-fly instrumentation mechanisms for the structured synchronization primitives of Parallel ...

Article
Free
Improving instruction cache behavior by reducing cache pollution
pp 82–91

In this paper we describe compiler techniques for improving instruction cache performance. Through repositioning of the code in main memory, leaving memory locations unused, code duplication, and code propagation, the effectiveness of the cache can be ...

Article
Free
A parallel Monte Carlo search algorithm for the conformational analysis of proteins
pp 94–102

In recent years several approaches have been proposed to overcome the multiple minima problem associated with non-linear optimization techniques used in the analysis of molecular conformations. One such technique based on a parallel Monte Carlo search ...

Article
Free
Folding RNA on the Cray-2
pp 103–111

Predicting RNA folding is a very computationally intensive task, that depends heavily on the assumptions of the model of folding. The 'stem list method' provides a flexible framework to change the assumptions of the model, but the price for this ...

Article
Free
A parallel computational approach using a cluster of IBM ES/3090 600Js for physical mapping of chromosomes
pp 112–121

A standard technique for mapping a chromosome is to randomly select pieces, to use restriction enzymes to cut these pieces into fragments, and then to use the fragments for estimating the probability of overlap of these pieces. We describe a ...

Article
Free
Experience with a performance analyzer for multithreaded applications
pp 124–131

Determining the effectiveness of parallelization requires performance data about elapsed process time and total CPU time. Furthermore, it is desirable not to have to run a parallel application in a stand-alone environment in order to obtain the profile. ...

Article
Free
Performance evaluation of the IBM RISC System/6000: comparison of an optimized scalar processor with two vector processors
pp 132–141

RISC System/6000 computers are workstations with a reduced instruction set processor recently developed by IBM. This report details the performance of the 6000-series computers as measured using a set of portable, standard-Fortran, computationally-...

Article
Free
The characterization of two scientific workloads using the CRAY X-MP performance monitor
pp 142–152

The weekend production period on a CRAY X-MP was monitored for several months at each of two supercomputing sites. The hardware performance monitor available on the X-MP was used to collect the data at each site. Various metrics are computed using the ...

Article
Free
Supercomputer network selection: a case study
pp 154–159

With the purchase of a Cray-2 supercomputer, Eli Lilly and Company (Lilly) needed a high performance network to provide communications with this computer. At the time of installation of the Cray, this network had to provide access from VAX/VMS computers ...

Article
Free
Very high performance networking for supercomputing
pp 160–168

NASA Ames has installed a very high bandwidth, 1 gigabit/second, local area network provided by Ultra Network Technologies to study the feasibility and performance of networking supercomputers with minisupercomputers and workstations at the effective ...

Article
Free
Cost-performance analysis of heterogeneity in supercomputer architectures
pp 169–177

Heterogeneity has appeared as a cost-effective approach to design high performance computers. This paper analyzes cost-performance of heterogeneity in supercomputer architectures. Queueing models are used to study performance of homogeneous and ...

Article
Free
Fast barrier synchronization hardware
pp 180–189

Many recent studies have considered the importance of barrier synchronization overhead on parallel loop performance, especially for large-scale parallel machines. This paper describes a hardware scheme for supporting fast barrier synchronization. It ...

Article
Free
Switch-stacks: a scheme for microtasking nested parallel loops
pp 190–199

This paper discusses run-time microtasking support for executing nested parallel loops on a shared memory multiprocessor system, and presents a new scheme called switch-stacks for implementing such support. We first discuss current approaches to flat ...

Article
Free
Parallelization of loops with exits on pipelined architectures
pp 200–212

Modulo scheduling theory can be applied successfully to overlap Fortran DO loops on pipelined computers issuing multiple operations per cycle both with and without special loop architectural support [1, 2, 3]. This paper shows that a broader class of ...

Article
Free
Computation of large-scale constrained matrix problems: the splitting equilibration algorithm
pp 214–223

The Constrained Matrix problem is a core problem in numerous applications in the social and economic sciences, including: the estimation of input-output tables, trade tables, and social/national accounts, the projection of migration flows over space and ...

Article
Free
High performance preconditioning on supercomputers for the 3D device simulator MINIMOS
pp 224–231

Discretization and iterative solution of the semiconductor equations in a three-dimensional rectangular region lead to very large sparse linear systems. Nevertheless, design engineers and scientists of device physics need reliable results in short time ...

Article
Free
Techniques for improving the performance of sparse matrix factorization on multiprocessor workstations
pp 232–241

In this paper we study the problem of factoring large sparse systems of equations on high-performance multiprocessor workstations. While these multiprocessor workstations are capable of very high peak floating point computation rates, most existing ...

Article
Free
Fault-tolerant routing in MIN-based supercomputers
pp 244–253

In this paper we study methods for routing data in supercomputers that use multistage interconnection networks (MINs), in the presence of faulty components in the network. These methods are applicable to existing multiprocessors like IBM GF11 and RP3. ...

    Article
    Free
    Uni-directional hypercubes
    pp 254–263

    Uni-directional hypercubes are hypercube interconnection topologies with simplex uni-directional links. While accommodating large number of nodes, uni-directional hypercubes require less complicated communication hardware than conventional bi-...

    Article
    Free
    Design and analysis of buffered crossbars and banyans with cut-through switching
    pp 264–273

    The design and approximate analyses of discrete time buffered crossbar and banyans with cut-through switching are presented. The crossbar switches can contain either (1) input FIFO queueing, (2) input “bypass” queueing where the FIFO discipline is ...

    Article
    Free
    A parallel object-oriented total architecture: A–NET
    pp 276–285

    A-NET is a parallel object-oriented total architecture for highly parallel computation. Starting with a computation model, this paper describes parallel constructs of the designed language, called A-NETL; the A-NETL oriented machine instruction set ...

    Article
    Free
    A parallel computer model supporting procedure-based communication
    pp 286–294

    Procedure-based communication can convert variant communication patterns of parallel computation to a simple data sending and receiving process. This paper describes a general purpose, MIMD parallel architecture that effectively supports the procedure-...

    Article
    Free
    A high-performance, memory-based interconnection system for multicomputer environments
    pp 295–304

    The objective of this paper is to outline the design and operation of a very high-performance, memory-mapped interconnection system, called Merlin. The design can be effectively utilized to interconnect processors in a wide variety on environments, ...

    Contributors
    • Institute for Defense Analyses
    • Sandia National Laboratories, New Mexico

    Recommendations

    Acceptance Rates

    Overall Acceptance Rate1,516of6,373submissions,24%
    YearSubmittedAcceptedRate
    SC '173276119%
    SC '164428118%
    SC '153587922%
    SC '143948321%
    SC '134499120%
    SC '1246110022%
    SC '113527421%
    SC '102535120%
    SC '092615923%
    SC '082775921%
    SC '072685420%
    SC '062395423%
    SC '052606224%
    SC '042006030%
    SC '032076029%
    SC '022306729%
    SC '012406025%
    SC '001796235%
    Supercomputing '952416929%
    Supercomputing '933007224%
    Supercomputing '922207534%
    Supercomputing '912158339%
    Overall6,3731,51624%