Richard W. Vuduc
Richard W. Vuduc

http://vuduc.org
richieatcc.gatech.edu

  Affiliation history
Bibliometrics: publication history
Average citations per article14.12
Citation Count946
Publication count67
Publication years2000-2017
Available for download33
Average downloads per article477.12
Downloads (cumulative)15,745
Downloads (12 Months)1,592
Downloads (6 Weeks)180
Professional ACM Member
SEARCH
ROLE
Arrow RightAuthor only
· Advisor only
· Other only
· All roles


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


67 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 67
Result page: 1 2 3 4

Sort by:

1 published by ACM
October 2017 Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 9,   Downloads (12 Months): 36,   Downloads (Overall): 36

Full text available: PDFPDF
We consider the problem of how to reduce the cost of communication that is required for the parallel training of a neural network. The state-of-the-art method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD), requires many collective communication operations, like broadcasts of parameters or reductions for partial gradient aggregations, which for ...
Keywords: deep learning system, mpi collectives, neural networks

2 published by ACM
August 2017 KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 46,   Downloads (12 Months): 254,   Downloads (Overall): 254

Full text available: PDFPDF
In exploratory tensor mining, a common problem is how to analyze a set of variables across a set of subjects whose observations do not align naturally. For example, when modeling medical features across a set of patients, the number and duration of treatments may vary widely in time, meaning there ...
Keywords: parafac2, phenotyping, unsupervised learning, sparse tensor factorization

3
April 2017 IEEE Transactions on Parallel and Distributed Systems: Volume 28 Issue 4, April 2017
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0

We consider the problem of how to design and implement communication-efficient versions of parallel kernel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input ...

4
November 2016 IA^3 '16: Proceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 5,   Downloads (12 Months): 62,   Downloads (Overall): 78

Full text available: PDFPDF
This paper presents the optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms. This primitive is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition. We first design and implement sequential SpTTM to avoid explicit ...

5 published by ACM
May 2016 FTXS '16: Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1,   Downloads (12 Months): 28,   Downloads (Overall): 69

Full text available: PDFPDF
We present a new fault-tolerant algorithm for the problem of computing the connected components of a graph. Our algorithm derives from a highly parallel but non-resilient algorithm, which is based on the technique of label propagation (LP). To make the (LP) algorithm resilient to transient soft faults, we apply an ...
Keywords: label propagation, self-stabilization, fault-tolerance, selective reliability, transient soft faults

6 published by ACM
November 2015 SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 5,   Downloads (12 Months): 70,   Downloads (Overall): 190

Full text available: PDFPDF
This paper describes a novel framework, called I n T ens L i ("intensely"), for producing fast single-node implementations of dense tensor-times-matrix multiply (T tm ) of arbitrary dimension. Whereas conventional implementations of T tm rely on explicitly converting the input tensor operand into a matrix---in order to be able ...
Keywords: code generation, offline autotuning, tensor operation, multilinear algebra

7 published by ACM
November 2015 IA3 '15: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 5,   Downloads (12 Months): 75,   Downloads (Overall): 183

Full text available: PDFPDF
Volume data has extensive uses in medical imaging, like, MRI (magnetic resonance imaging) scan, visual effects production, including volume rendering and fluid simulation, computer-aided design and manufacturing (CAD/CAM) in advanced prototyping, such as, 3D Printing, among others. This work presents a compact hierarchical data structure, dubbed HDT, for extreme-scale volume ...
Keywords: voxels, rapid prototyping, octree, GPGPU acceleration

8
November 2015 ICDM '15: Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM)
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 3

We propose a new tensor factorization method, called the Sparse Hierarchical-Tucker (Sparse H-Tucker), for sparse and high-order data tensors. Sparse H-Tucker is inspired by its namesake, the classical Hierarchical Tucker method, which aims to compute a tree-structured factorization of an input data set that may be readily interpreted by a ...

9 published by ACM
June 2015 SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 3,   Downloads (12 Months): 59,   Downloads (Overall): 233

Full text available: PDFPDF
This paper quantifies the impact of branches and branch mispredictions on the single-core performance of certain graph problems, specifically for computing connected components. We show that branch mispredictions are costly and can reduce performance by as much as 30%-50%. This insight suggests that one should seek graph algorithms and implementations ...
Keywords: connected components, performance engineering, code generation, predication, branch prediction

10
May 2015 IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1

This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the HALO. The name is shorthand for highly asynchronous ...
Keywords: Sparse Direct Solver, Xeon-Phi acceleration, GPU, MPI, OpenMP, Communication-avoiding algorithm, Heterogeneous computing

11
May 2015 IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 4

We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data ...
Keywords: distributed memory algorithms, communication avoidance, statistical machine learning

12
May 2015 Software Testing, Verification & Reliability: Volume 25 Issue 3, May 2015
Publisher: John Wiley and Sons Ltd.
Bibliometrics:
Citation Count: 1

UNICORN is an automated dynamic pattern-detection-based technique that finds and ranks problematic memory access patterns for non-deadlock concurrency bugs. It monitors pairs of memory accesses, combines the pairs into problematic patterns and ranks the patterns by their suspiciousness scores. It detects significant classes of bug types, including order violations and ...
Keywords: concurrency, multithreaded programs, fault localization, debugging

13
June 2014 ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture
Publisher: IEEE Press
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 3,   Downloads (12 Months): 91,   Downloads (Overall): 635

Full text available: PDFPDF
Traditionally, architectural innovations designed to boost single-threaded performance incur overhead costs which significantly increase power consumption. In many cases the increase in power exceeds the improvement in performance, resulting in a net increase in energy consumption. Thus, it is reasonable to assume that modern attempts to improve singlethreaded performance will ...
Also published in:
October 2014  ACM SIGARCH Computer Architecture News - ISCA '14: Volume 42 Issue 3, June 2014

14
May 2014 IPDPS '14: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 10

We conducted a micro benchmarking study of the time, energy, and power of computation and memory access on several existing platforms. These platforms represent candidate compute-node building blocks of future high-performance computing systems. Our analysis uses the "energy roofline" model, developed in prior work, which we extend in two ways. ...
Keywords: energy, power, algorithms, system balance, performance modeling

15 published by ACM
March 2014 GPGPU-7: Proceedings of Workshop on General Purpose Processing Using GPUs
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 1,   Downloads (12 Months): 18,   Downloads (Overall): 147

Full text available: PDFPDF
This paper presents an optimized CPU--GPU hybrid implementation and a GPU performance model for the kernel-independent fast multipole method (FMM). We implement an optimized kernel-independent FMM for GPUs, and combine it with our previous CPU implementation to create a hybrid CPU+GPU FMM kernel. When compared to another highly optimized GPU ...
Keywords: GPU, exascale, fast multipole method, hybrid, performance model, multicore

16
February 2014 Statistical Analysis and Data Mining: Volume 7 Issue 1, February 2014
Publisher: John Wiley & Sons, Inc.
Bibliometrics:
Citation Count: 0

Kernel summations are a ubiquitous key computational bottleneck in many data analysis methods. In this paper, we attempt to marry, for the first time, the best relevant techniques in parallel computing, where kernel summations are in low dimensions, with the best general-dimension algorithms from the machine learning literature. We provide ...
Keywords: CUDA, kernel methods, parallel multidimensional trees, GPGPU, nonparametric methods, parallel machine learning

17 published by ACM
November 2013 ScalA '13: Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Publisher: ACM
Bibliometrics:
Citation Count: 9
Downloads (6 Weeks): 6,   Downloads (12 Months): 36,   Downloads (Overall): 280

Full text available: PDFPDF
We show how to use the idea of self-stabilization , which originates in the context of distributed control, to make fault-tolerant iterative solvers. Generally, a self-stabilizing system is one that, starting from an arbitrary state (valid or invalid), reaches a valid state within a finite number of steps. This property ...
Keywords: transient soft faults, self-stabilization, fault-tolerance, iterative linear solvers

18
November 2013 International Journal of High Performance Computing Applications: Volume 27 Issue 4, November 2013
Publisher: Sage Publications, Inc.
Bibliometrics:
Citation Count: 0


19 published by ACM
July 2013 ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 2,   Downloads (12 Months): 15,   Downloads (Overall): 215

Full text available: PDFPDF
This paper presents Griffin, a new fault-comprehension technique. Griffin provides a way to explain concurrency bugs using additional information over existing fault-localization techniques, and thus, bridges the gap between fault- localization and fault-fixing techniques. Griffin inputs a list of memory-access patterns and a coverage matrix, groups those patterns responsible for ...
Keywords: Concurrency, Fault Comprehension, Debugging

20
May 2013 IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 9

We consider the problem of how to enable computer architects and algorithm designers to reason directly and analytically about the relationship between high-level architectural features and algorithm characteristics. We propose a modeling framework designed to help understand the long-term and high-level impacts of algorithmic and technology trends. This model connects ...
Keywords: exascale, algorithm-architecture codesign



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us