Author image not provided
 Neal Clayton Crago

Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article11.43
Citation Count80
Publication count7
Publication years2008-2015
Available for download5
Average downloads per article921.20
Downloads (cumulative)4,606
Downloads (12 Months)346
Downloads (6 Weeks)21
Arrow RightAuthor only

See all colleagues of this author

See all subject areas


7 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 7 of 7
Sort by:

1 published by ACM
September 2015 ACM Transactions on Computer Systems (TOCS): Volume 33 Issue 3, September 2015
Publisher: ACM
Citation Count: 3
Downloads (6 Weeks): 6,   Downloads (12 Months): 82,   Downloads (Overall): 337

Full text available: PDFPDF
There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to efficiently control each processing element (PE) in the system, and how to facilitate inter-PE communication without the ...
Keywords: Spatial programming, reconfigurable accelerators

2 published by ACM
June 2013 ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
Publisher: ACM
Citation Count: 18
Downloads (6 Weeks): 5,   Downloads (12 Months): 178,   Downloads (Overall): 1,191

Full text available: PDFPDF
In this paper, we present triggered instructions , a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. They also allow efficient reactivity to inter-PE communication ...
Keywords: reconfigurable accelerators, spatial programming
Also published in:
June 2013  ACM SIGARCH Computer Architecture News - ICSA '13: Volume 41 Issue 3, June 2013

February 2013 HPCA '13: Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
Publisher: IEEE Computer Society
Citation Count: 0

Currently, GPUs and data parallel processors leverage latency tolerance techniques such as multithreading and prefetching to maximize performance per Watt. However, choosing a technique that provides energy-efficiency on a wide variety of workloads is difficult, as the type of latency to tolerate, required hardware complexity, and energy consumption is directly ...

October 2011 PACT '11: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Publisher: IEEE Computer Society
Citation Count: 0

In this paper we present OUTRIDERHP, a novel implementation of a decoupled architecture that approaches the performance of contemporary out-of-order processors on parallel benchmarks while maintaining low hardware complexity. OUTRIDERHP leverages the compiler to separate a single thread of execution into memory-accessing and memory-consuming streams that can be executed concurrently, ...
Keywords: High-Performance, Decoupled, Processor, Computer Architecture

5 published by ACM
June 2011 ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
Publisher: ACM
Citation Count: 7
Downloads (6 Weeks): 5,   Downloads (12 Months): 28,   Downloads (Overall): 562

Full text available: PDFPDF
We present OUTRIDER, an architecture for throughput-oriented processors that provides memory latency tolerance to improve performance on highly threaded workloads. OUTRIDER enables a single thread of execution to be presented to the architecture as multiple decoupled instruction streams that separate memory-accessing and memory-consuming instructions. The key insight is that by ...
Keywords: accelerator, memory latency, computer architecture
Also published in:
June 2011  ACM SIGARCH Computer Architecture News - ISCA '11: Volume 39 Issue 3, June 2011

6 published by ACM
June 2009 ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
Publisher: ACM
Citation Count: 36
Downloads (6 Weeks): 6,   Downloads (12 Months): 50,   Downloads (Overall): 1,903

Full text available: PDFPDF
This paper considers Rigel, a programmable accelerator architecture for a broad class of data- and task-parallel computation. Rigel comprises 1000+ hierarchically-organized cores that use a fine-grained, dynamically scheduled single-program, multiple-data (SPMD) execution model. Rigel's low-level programming interface adopts a single global address space model where parallel work is expressed in ...
Keywords: computer architecture, low-level programming interface, accelerator
Also published in:
June 2009  ACM SIGARCH Computer Architecture News: Volume 37 Issue 3, June 2009

November 2008 MICRO 41: Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Publisher: IEEE Computer Society
Citation Count: 16
Downloads (6 Weeks): 0,   Downloads (12 Months): 9,   Downloads (Overall): 614

Full text available: PDFPDF
Visualization, interaction, and simulation (VIS) constitute a class of applications that is growing in importance. This class includes applications such as graphics rendering, video encoding, simulation, and computer vision. These applications are ideally suited for accelerators because of their parallelizability and demand for high throughput. We compile a benchmark suite, ...

The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us