Author image not provided
 Georg Hager

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article8.06
Citation Count266
Publication count33
Publication years2002-2017
Available for download5
Average downloads per article152.60
Downloads (cumulative)763
Downloads (12 Months)209
Downloads (6 Weeks)45
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


33 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 33
Result page: 1 2

Sort by:

1 published by ACM
December 2017 ACM Transactions on Parallel Computing (TOPC): Volume 4 Issue 3, January 2018
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 35,   Downloads (12 Months): 35,   Downloads (Overall): 35

Full text available: PDFPDF
Optimizing the performance of stencil algorithms has been the subject of intense research over the last two decades. Since many stencil schemes have low arithmetic intensity, most optimizations focus on increasing the temporal data access locality, thus reducing the data traffic through the main memory interface with the ultimate goal ...
Keywords: Wireless sensor networks, media access control, multi-channel, radio interference, time synchronization

2
October 2017 International Journal of Parallel Programming: Volume 45 Issue 5, October 2017
Publisher: Kluwer Academic Publishers
Bibliometrics:
Citation Count: 0

While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing ...
Keywords: Sparse linear algebra, Large scale computing, Software library, Task parallelism, Heterogeneous computing, Data parallelism

3
November 2016 Journal of Computational Physics: Volume 325 Issue C, November 2016
Publisher: Academic Press Professional, Inc.
Bibliometrics:
Citation Count: 0

We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of ...
Keywords: Performance engineering, Quantum physics, Topological materials, Chebyshev filter polynomials, Interior eigenvalues

4
May 2016 Concurrency and Computation: Practice & Experience: Volume 28 Issue 7, May 2016
Publisher: John Wiley and Sons Ltd.
Bibliometrics:
Citation Count: 1

Memory-bound algorithms show complex performance and energy consumption behavior on multicore processors. We choose the lattice Boltzmann method on an Intel Sandy Bridge cluster as a prototype scenario to investigate if and how single-chip performance and power characteristics can be generalized to the highly parallel case. First, we perform an ...
Keywords: energy optimization, lattice Boltzmann method, ECM performance model

5
April 2016 Proceedings of the 29th International Conference on Architecture of Computing Systems -- ARCS 2016 - Volume 9637
Publisher: Springer-Verlag New York, Inc.
Bibliometrics:
Citation Count: 1

This paper presents an in-depth analysis of Intel's Haswell microarchitecture for streaming loop kernels. Among the new features examined are the dual-ring Uncore design, Cluster-on-Die mode, Uncore Frequency Scaling, enhancements such as new and improved execution units, as well as improvements throughout the memory hierarchy. The Execution-Cache-Memory diagnostic performance model ...
Keywords: Architecture analysis, Intel Haswell, ECM model, Performance modeling

6
February 2016 Concurrency and Computation: Practice & Experience: Volume 28 Issue 2, February 2016
Publisher: John Wiley and Sons Ltd.
Bibliometrics:
Citation Count: 0


7
February 2016 Concurrency and Computation: Practice & Experience: Volume 28 Issue 2, February 2016
Publisher: John Wiley and Sons Ltd.
Bibliometrics:
Citation Count: 11

Modern multi-core chips show complex behavior with respect to performance and power. Starting with the Intel Sandy Bridge processor, it has become possible to directly measure the power dissipation of a CPU chip and correlate this data with the performance properties of the running code. Going beyond a simple bottleneck ...
Keywords: multi-core, power modeling, ECM model, performance modeling

8 published by ACM
November 2015 PMBS '15: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 2,   Downloads (12 Months): 26,   Downloads (Overall): 155

Full text available: PDFPDF
Analytic performance models are essential for understanding the performance characteristics of loop kernels, which consume a major part of CPU cycles in computational science. Starting from a validated performance model one can infer the relevant hardware bottlenecks and promising optimization opportunities. Unfortunately, analytic performance modeling is often tedious even for ...

9
September 2015 CLUSTER '15: Proceedings of the 2015 IEEE International Conference on Cluster Computing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

It is commonly agreed that highly parallel software on Exascale computers will suffer from many more runtime failures due to the decreasing trend in the mean time to failures (MTTF). Therefore, it is not surprising that a lot of research is going on in the area of fault tolerance and ...
Keywords: GASPI, GPI, fault detection, fault tolerance, fault recovery, checkpoint-restart, pre-allocated spare processes

10 published by ACM
June 2015 ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 7,   Downloads (12 Months): 92,   Downloads (Overall): 279

Full text available: PDFPDF
Stencil algorithms on regular lattices appear in many fields of computational science, and much effort has been put into optimized implementations. Such activities are usually not guided by performance models that provide estimates of expected speedup. Understanding the performance properties and bottlenecks by performance modeling enables a clear view on ...
Keywords: stencils, performance model, multicore, optimization

11
May 2015 IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1

The Kernel Polynomial Method (KPM) is a well-established scheme in quantum physics and quantum chemistry to determine the Eigen value density and spectral properties of large sparse matrices. In this work we demonstrate the high optimization potential and feasibility of peta-scale heterogeneous CPU-GPU implementations of the KPM. At the node ...
Keywords: Parallel programming, Quantum mechanics, Performance analysis, Sparse matrices

12
September 2014 ICPPW '14: Proceedings of the 2014 43rd International Conference on Parallel Processing Workshops
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

LIKWID is a set of performance-related command line tools targeting X86 processors. Besides affinity-related tools it also includes likwid-perfctr, which allows to count hardware performance events. LIKWID builds upon the Linux msr kernel module, which allows to access model-specific registers (MSRs) via a device file interface. In addition to a ...
Keywords: Hardware Performance Counters, Profiling, Overhead, Tools, X86

13 published by ACM
February 2014 WPMVP '14: Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 2,   Downloads (12 Months): 40,   Downloads (Overall): 200

Full text available: PDFPDF
Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the efficiency of different SIMD-vectorized implementations of the RabbitCT benchmark. RabbitCT performs 3D image reconstruction by back projection, ...
Keywords: back projection, gather, performance, intel MIC, SIMD, computed tomography

14
May 2013 IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1

Today's High Performance Computing (HPC) clusters consist of hundreds of thousands of CPUs, memory units, complex networks, and other components. Such an extreme level of hardware parallelism reduces the mean time to failure (MTTF) of the overall cluster. The future of HPC urgently demands to develop environments that facilitate programs ...
Keywords: fault tolerance, asynchronous checkpointing, multi-stage checkpointing, checkpoint/restart, MPI

15
May 2013 International Journal of High Performance Computing Applications: Volume 27 Issue 2, May 2013
Publisher: Sage Publications, Inc.
Bibliometrics:
Citation Count: 5

Volume reconstruction by backprojection is the computational bottleneck in many interventional clinical computed tomography (CT) applications. Today vendors in this field replace special purpose hardware accelerators with standard hardware such as multicore chips and GPGPUs. Medical imaging algorithms are on the verge of employing high-performance computing (HPC) technology, and are ...
Keywords: SandyBridge (AVX), performance analysis, SIMD, medical imaging, performance optimization

16
March 2013 Computers & Mathematics with Applications: Volume 65 Issue 6, March, 2013
Publisher: Pergamon Press, Inc.
Bibliometrics:
Citation Count: 6

Several possibilities exist to implement the propagation step of lattice Boltzmann methods. This paper describes common implementations and compares the number of memory transfer operations they require per lattice node update. A performance model based on the memory bandwidth is then used to obtain an estimation of the maximum achievable ...
Keywords: Implementation, A-A pattern, Lattice Boltzmann method, Propagation step, Performance

17
August 2012 Euro-Par'12: Proceedings of the 18th international conference on Parallel processing workshops
Publisher: Springer-Verlag
Bibliometrics:
Citation Count: 1

Many tools and libraries employ hardware performance monitoring (HPM) on modern processors, and using this data for performance assessment and as a starting point for code optimizations is very popular. However, such data is only useful if it is interpreted with care, and if the right metrics are chosen for ...

18
August 2012 Euro-Par'12: Proceedings of the 18th international conference on Parallel processing workshops
Publisher: Springer-Verlag
Bibliometrics:
Citation Count: 0

The ultimate purpose of running simulation tasks on high performance computers is to solve numerical problems. The performance of an algorithm, or rather an implementation, is significant in several respects: Either a given problem should be solved in the least possible amount of time or a larger problem should be ...

19
May 2012 IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 4

Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to ...
Keywords: GPGPU, Sparse matrices, CUDA

20
March 2012 SIAM Journal on Scientific Computing: Volume 34 Issue 2, April 2012
Publisher: Society for Industrial and Applied Mathematics
Bibliometrics:
Citation Count: 5

In the last decade, expression templates (ETs) have gained a reputation as an efficient performance optimization tool for C++ codes. This reputation builds on several ET-based linear algebra frameworks focused on combining both elegant and high-performance C++ code. However, on closer examination the assumption that ETs are a performance optimization ...
Keywords: Blitz++, high performance programming, linear algebra, Blaze, Eigen3, Boost, MTL4, expression templates, performance optimization, uBLAS



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us