Torsten Hoefler
Torsten Hoefler
htoratinf.ethz.ch

  Affiliation history
Bibliometrics: publication history
Average citations per article7.35
Citation Count941
Publication count128
Publication years2005-2017
Available for download63
Average downloads per article224.60
Downloads (cumulative)14,150
Downloads (12 Months)3,246
Downloads (6 Weeks)304
SEARCH
ROLE
Arrow RightAuthor only
· Editor only
· Other only
· All roles


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


133 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 133
Result page: 1 2 3 4 5 6 7

Sort by:

1 published by ACM
April 2018 EuroSys '18: Proceedings of the Thirteenth EuroSys Conference
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 22,   Downloads (12 Months): 166,   Downloads (Overall): 167

Full text available: PDFPDF
In-memory key-value stores (KVSs) provide different forms of resilience through basic r -way replication and complex erasure codes such as Reed-Solomon. Each storage scheme exhibits different tradeoffs in terms of reliability and resources used (memory, network load, latency, storage required, etc.). Unfortunately, most KVSs support only a single such storage ...
Keywords: key-value store, reed-solomon, replication, resilience management

2 published by ACM
March 2018 ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 16,   Downloads (12 Months): 146,   Downloads (Overall): 146

Full text available: PDFPDF
Emerging chips with hundreds and thousands of cores require networks with unprecedented energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on-chip network design that delivers significant improvements in efficiency and scalability compared to the state-of-the-art. The key idea is to use two concepts from ...
Keywords: energy efficiency, many-core systems, on-chip-networks, parallel processing, scalability

3 published by ACM
February 2018 PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 15,   Downloads (12 Months): 168,   Downloads (Overall): 168

Full text available: PDFPDF
We present novel scalable parallel algorithms for finding global minimum cuts and connected components, which are important and fundamental problems in graph processing. To take advantage of future massively parallel architectures, our algorithms are communication-avoiding : they reduce the costs of communication across the network and the cache hierarchy. The ...
Keywords: graph algorithms, minimum cuts, parallel computing, randomized algorithms
Also published in:
March 2018  ACM SIGPLAN Notices - PPoPP '18: Volume 53 Issue 1, January 2018

4 published by ACM
February 2018 PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 11,   Downloads (12 Months): 111,   Downloads (Overall): 111

Full text available: PDFPDF
Massive spatial parallelism at low energy gives FPGAs the potential to be core components in large scale high performance computing (HPC) systems. In this paper we present four major design steps that harness high-level synthesis (HLS) to implement scalable spatial FPGA algorithms. To aid productivity, we introduce the open source ...
Also published in:
March 2018  ACM SIGPLAN Notices - PPoPP '18: Volume 53 Issue 1, January 2018

5 published by ACM
November 2017 SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 10,   Downloads (12 Months): 194,   Downloads (Overall): 194

Full text available: PDFPDF
Betweenness centrality (BC) is a crucial graph problem that measures the significance of a vertex by the number of shortest paths leading through it. We propose Maximal Frontier Betweenness Centrality (MFBC): a succinct BC algorithm based on novel sparse matrix multiplication routines that performs a factor of p 1/3 less ...
Keywords: sparse matrix multiplication, communication cost, parallel algorithm, betweenness centrality

6 published by ACM
November 2017 SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 13,   Downloads (12 Months): 215,   Downloads (Overall): 215

Full text available: PDFPDF
Optimizing communication performance is imperative for large-scale computing because communication overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized for data movement. However, these devices are limited to fixed functions, such as remote direct memory access. We develop sPIN, a portable programming model ...

7 published by ACM
July 2017 SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 7,   Downloads (12 Months): 76,   Downloads (Overall): 112

Full text available: PDFPDF
Many large-scale scientific computations require eigenvalue solvers in a scaling regime where efficiency is limited by data movement. We introduce a parallel algorithm for computing the eigenvalues of a dense symmetric matrix, which performs asymptotically less communication than previously known approaches. We provide analysis in the Bulk Synchronous Parallel (BSP) ...
Keywords: communication cost, parallel algorithms, symmetric eigenvalue problem

8 published by ACM
June 2017 HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 6,   Downloads (12 Months): 76,   Downloads (Overall): 101

Full text available: PDFPDF
Many distributed systems require coordination between the components involved. With the steady growth of such systems, the probability of failures increases, which necessitates scalable fault-tolerant agreement protocols. The most common practical agreement protocol, for such scenarios, is leader-based atomic broadcast. In this work, we propose AllConcur, a distributed system that ...
Keywords: distributed agreement, leaderless atomic broadcast, reliability

9 published by ACM
June 2017 HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 12,   Downloads (12 Months): 172,   Downloads (Overall): 212

Full text available: PDFPDF
We reduce the cost of communication and synchronization in graph processing by analyzing the fastest way to process graphs: pushing the updates to a shared state or pulling the updates to a private state. We investigate the applicability of this push-pull dichotomy to various algorithms and its impact on complexity, ...
Keywords: graph computations

10 published by ACM
January 2017 PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 4,   Downloads (12 Months): 68,   Downloads (Overall): 163

Full text available: PDFPDF
In the many-core era, the performance of MPI collectives is more dependent on the intra-node communication component. However, the communication algorithms generally inherit from the inter-node version and ignore the cache complexity. We propose cache-oblivious algorithms for MPI all-to-all operations, in which data blocks are copied into the receive buffers ...
Keywords: many-core, cache-oblivious algorithms, mpi_alltoall
Also published in:
October 2017  ACM SIGPLAN Notices - PPoPP '17: Volume 52 Issue 8, August 2017

11 published by ACM
January 2017 PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 10,   Downloads (12 Months): 107,   Downloads (Overall): 250

Full text available: PDFPDF
Task-based programming offers an elegant way to express units of computation and the dependencies among them, making it easier to distribute the computational load evenly across multiple cores. However, this separation of problem decomposition and parallelism requires a sufficiently large input problem to achieve satisfactory efficiency on a given number ...
Keywords: parallel programming, performance analysis, performance modeling, tasking, co-design, isoefficiency
Also published in:
October 2017  ACM SIGPLAN Notices - PPoPP '17: Volume 52 Issue 8, August 2017

12
January 2017 Proceedings of the VLDB Endowment: Volume 10 Issue 5, January 2017
Publisher: VLDB Endowment
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 12,   Downloads (12 Months): 130,   Downloads (Overall): 183

Full text available: PDFPDF
Traditional database operators such as joins are relevant not only in the context of database engines but also as a building block in many computational and machine learning algorithms. With the advent of big data, there is an increasing demand for efficient join algorithms that can scale with the input ...

13
November 2016 SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 6,   Downloads (12 Months): 41,   Downloads (Overall): 159

Full text available: PDFPDF
The interconnection network has a large influence on total cost, application performance, energy consumption, and overall system efficiency of a supercomputer. Unfortunately, today's routing algorithms do not utilize this important resource most efficiently. We first demonstrate this by defining the dark fiber metric as a measure of unused resource in ...
Keywords: high performance computing, unicast, computer network management, routing protocols

14
November 2016 SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 4,   Downloads (12 Months): 23,   Downloads (Overall): 109

Full text available: PDFPDF
MeteoSwiss, the Swiss national weather forecast institute, has selected densely populated accelerator servers as their primary system to compute weather forecast simulation. Servers with multiple accelerator devices that are primarily connected by a PCI-Express (PCIe) network achieve a significantly higher energy efficiency. Memory transfers between accelerators in such a system ...
Keywords: PCI-express, multiple GPUs, performance model

15
November 2016 SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 2,   Downloads (12 Months): 22,   Downloads (Overall): 84

Full text available: PDFPDF
The goal of the extreme scale plasma turbulence studies described in this paper is to expedite the delivery of reliable predictions on confinement physics in large magnetic fusion systems by using world-class supercomputers to carry out simulations with unprecedented resolution and temporal duration. This has involved architecture-dependent optimizations of performance ...

16
November 2016 SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1,   Downloads (12 Months): 49,   Downloads (Overall): 217

Full text available: PDFPDF
Over the last decade, CUDA and the underlying GPU hardware architecture have continuously gained popularity in various high-performance computing application domains such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent programming model for GPU clusters. We therefore introduce the dCUDA programming model, ...
Keywords: programming model, gpu, remote memory access, distributed memory, latency hiding

17 published by ACM
October 2016 OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 3,   Downloads (12 Months): 56,   Downloads (Overall): 162

Full text available: PDFPDF
Recent advances in networking hardware have led to a new generation of Remote Memory Access (RMA) networks in which processors from different machines can communicate directly, bypassing the operating system and allowing higher performance. Researchers and practitioners have proposed libraries and programming models for RMA to enable the development of ...
Keywords: Memory Model
Also published in:
December 2016  ACM SIGPLAN Notices - OOPSLA '16: Volume 51 Issue 10, October 2016

18
October 2016 IEEE Transactions on Parallel and Distributed Systems: Volume 27 Issue 10, October 2016
Publisher: IEEE Press
Bibliometrics:
Citation Count: 1

The increase in the number of cores per processor and the complexity of memory hierarchies make cache coherence key for programmability of current shared memory systems. However, ignoring its detailed architectural characteristics can harm performance significantly. In order to assist performance-centric programming, we propose a methodology to allow semi-automatic performance ...

19
July 2016 IEEE Micro: Volume 36 Issue 4, July 2016
Publisher: IEEE Computer Society Press
Bibliometrics:
Citation Count: 2

Network interface cards are one of the key components to achieve efficient parallel performance. In the past, they have gained new functionalities, such as lossless transmission and remote direct memory access, that are now ubiquitous in high-performance systems. Prototypes of next-generation network cards now offer new features that facilitate device ...

20 published by ACM
June 2016 PASC '16: Proceedings of the Platform for Advanced Scientific Computing Conference
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 2,   Downloads (12 Months): 48,   Downloads (Overall): 72

Full text available: PDFPDF
We discuss the paper selection process of the ACM PASC16 conference. The conference spans multiple scienti fic fi elds used to very diff erent publication cultures. We aim to combine the strengths of the conference and journal publication schemes in order to design an attractive high-quality publication venue for works ...
Keywords: PASC conference series, review processes, paper selection in HPC



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us