Abstract
The efforts of an expert to parallelize and optimize a dense linear algebra algorithm for distributed-memory targets are largely mechanical and repetitive. We demonstrate that these efforts can be encoded and automatically applied to obviate the manual implementation of many algorithms in high-performance code.
- J. Poulson et al. Parallel algorithms for reducing the generalized hermitian-definite eigenvalue problem. ACM Transactions on Mathematical Software. submitted.Google Scholar
- J. Poulson et al. Elemental: A new framework for distributed memory dense matrix computations. FLAME Working Note #44 TR-2010-20, The University of Texas at Austin, Department of Computer Sciences, 2010. Submitted to ACM TOMS. Google Scholar
Digital Library
- T. Riche et al. Software architecture design by transformation. Computer Science report TR-11-19, Univ. of Texas at Austin, 2011.Google Scholar
- P. G. Selinger et al. Access Path Selection in a Relational Database Management System. In ACM SIGMOD, 1979. Google Scholar
Digital Library
- F. G. Van Zee. ph libflame: The Complete Reference. www.lulu.com, 2009.Google Scholar
- F. G. Van Zee et al. Introducing: The libflame library for dense matrix computations. IEEE Computation in Science & Engineering, 11 (6): 56--62, 2009. Google Scholar
Digital Library
Index Terms
Mechanizing the expert dense linear algebra developer
Recommendations
Mechanizing the expert dense linear algebra developer
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel ProgrammingThe efforts of an expert to parallelize and optimize a dense linear algebra algorithm for distributed-memory targets are largely mechanical and repetitive. We demonstrate that these efforts can be encoded and automatically applied to obviate the manual ...
Parallelizing dense linear algebra operations with task queues in llc
PVM/MPI'07: Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interfacellc is a language based on C where parallelism is expressed using compiler directives. The llc compiler produces MPI code which can be ported to both shared and distributed memory systems.
In this work we focus our attention in the llc implementation of ...
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures
The objective of this paper is to analyze the dynamic scheduling of dense linear algebra algorithms on shared-memory, multicore architectures. Current numerical libraries (e.g., linear algebra package) show clear limitations on such emerging systems ...







Comments