Author image not provided
 Felix Wolf

Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article2.00
Citation Count10
Publication count5
Publication years2015-2016
Available for download4
Average downloads per article109.25
Downloads (cumulative)437
Downloads (12 Months)102
Downloads (6 Weeks)27
Arrow RightAuthor only

See all colleagues of this author


6 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 6 of 6
Sort by:

1 published by ACM
August 2018 ICPP 2018: Proceedings of the 47th International Conference on Parallel Processing
Publisher: ACM
Citation Count: 0
Downloads (6 Weeks): 18,   Downloads (12 Months): 18,   Downloads (Overall): 18

Full text available: PDFPDF
A critical factor for developing robust shared-memory applications is the efficient use of the cache and the communication between threads. Inappropriate data structures, algorithm design, and inefficient thread affinity may result in superfluous communication between threads/cores and severe performance problems. For this reason, state-of-the-art profiling tools focus on thread communication ...
Keywords: Shared memory, communication, data locality, multi-threading, profiling

2 published by ACM
July 2016 ACM Transactions on Parallel Computing (TOPC): Volume 3 Issue 2, August 2016
Publisher: ACM
Citation Count: 0
Downloads (6 Weeks): 10,   Downloads (12 Months): 87,   Downloads (Overall): 87

Full text available: PDFPDF
Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait ...
Keywords: Performance analysis, load imbalance, OpenMP, event tracing, MPI, root-cause analysis

July 2016 Journal of Systems and Software: Volume 117 Issue C, July 2016
Publisher: Elsevier Science Inc.
Citation Count: 0

Detect both loop and task parallelism in a single tool.Identify parallelism based on the concept of computational units (CUs).A ranking method to highlight the most promising parallelization targets.Time and memory overhead that is low enough to deal with real-world applications. The stagnation of single-core performance leaves application developers with software ...
Keywords: Program analysis, Parallelism discovery, Parallelization, Profiling, Data dependence

4 published by ACM
November 2015 VPA '15: Proceedings of the 2nd Workshop on Visual Performance Analysis
Publisher: ACM
Citation Count: 2
Downloads (6 Weeks): 4,   Downloads (12 Months): 11,   Downloads (Overall): 98

Full text available: PDFPDF
Performance-analysis tools are indispensable for understanding and optimizing the behavior of parallel programs running on increasingly powerful supercomputers. However, with size and complexity of hardware and software on the rise, performance data sets are becoming so voluminous that their analysis poses serious challenges. In particular, the search space that must ...

5 published by ACM
November 2015 ESPT '15: Proceedings of the 4th Workshop on Extreme Scale Programming Tools
Publisher: ACM
Citation Count: 0
Downloads (6 Weeks): 2,   Downloads (12 Months): 20,   Downloads (Overall): 45

Full text available: PDFPDF
State of the art performance analysis tools, such as Score-P, record performance profiles on a per-thread basis. However, for exascale systems the number of threads is expected to be in the order of a billion threads, and this would result in extremely large performance profiles. In most cases the user ...
Keywords: performance analysis, data compression, exascale computing

6 published by ACM
June 2015 ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
Publisher: ACM
Citation Count: 4
Downloads (6 Weeks): 1,   Downloads (12 Months): 25,   Downloads (Overall): 121

Full text available: PDFPDF
Many libraries in the HPC field encapsulate sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very ...
Keywords: parallel programming, performance analysis, software engineering, high performance computing

The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us