Author image not provided
 Christopher Justin Hughes

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article22.11
Citation Count597
Publication count27
Publication years1996-2017
Available for download18
Average downloads per article676.89
Downloads (cumulative)12,184
Downloads (12 Months)1,046
Downloads (6 Weeks)111
SEARCH
ROLE
Arrow RightAuthor only
· Editor only
· All roles


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


27 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 27
Result page: 1 2

Sort by:

1 published by ACM
October 2017 MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 56,   Downloads (12 Months): 630,   Downloads (Overall): 630

Full text available: PDFPDF
Placing the DRAM in the same package as a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM ...
Keywords: main memory, DRAM cache, TLB coherence, cache replacement, hybrid memory systems, in-package DRAM

2 published by ACM
December 2015 MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 8
Downloads (6 Weeks): 4,   Downloads (12 Months): 54,   Downloads (Overall): 265

Full text available: PDFPDF
Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming ...

3 published by ACM
November 2013 SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 3,   Downloads (12 Months): 14,   Downloads (Overall): 354

Full text available: PDFPDF
As cache hierarchies become deeper and the number of cores on a chip increases, managing caches becomes more important for performance and energy. However, current hardware cache management policies do not always adapt optimally to the applications behavior: e.g., caches may be polluted by data structures whose locality cannot be ...
Keywords: producer-consumer communication, reuse distance, streaming memory accesse, energy-efficient memory hierarchy

4 published by ACM
November 2013 SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 68
Downloads (6 Weeks): 13,   Downloads (12 Months): 122,   Downloads (Overall): 947

Full text available: PDFPDF
Intel has recently introduced Intel ® Transactional Synchronization Extensions (Intel ® TSX) in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically determine whether threads need to serialize through lock-protected critical sections. In this paper, we evaluate the first hardware implementation of Intel TSX using a ...
Keywords: transactional memory, high-performance computing

5 published by ACM
July 2013 SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 6,   Downloads (12 Months): 31,   Downloads (Overall): 219

Full text available: PDFPDF
As we increase the number of cores on a processor die, the on-chip cache hierarchies that support these cores are getting larger, deeper, and more complex. As a result, non-uniform memory access effects are now prevalent even on a single chip. To reduce execution time and energy consumption, data access ...
Keywords: task scheduling, performance, task stealing, energy, locality

6 published by ACM
June 2011 ACM Transactions on Architecture and Code Optimization (TACO): Volume 8 Issue 2, July 2011
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1,   Downloads (12 Months): 10,   Downloads (Overall): 344

Full text available: PDFPDF
While multicore processors promise large performance benefits for parallel applications, writing these applications is notoriously difficult. Tuning a parallel application to achieve good performance, also known as performance debugging, is often more challenging than debugging the application for correctness. Parallel programs have many performance-related issues that are not seen in ...
Keywords: Performance debugging, coherence misses, false sharing, multicore processors

7 published by ACM
June 2011 ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
Publisher: ACM
Bibliometrics:
Citation Count: 7
Downloads (6 Weeks): 0,   Downloads (12 Months): 15,   Downloads (Overall): 685

Full text available: PDFPDF
In recent years, the increasing number of processor cores and limited increases in main memory bandwidth have led to the problem of the bandwidth wall , where memory bandwidth is becoming a performance bottleneck. This is especially true for emerging latency-insensitive, bandwidth-sensitive applications. Designing the memory hierarchy for a platform ...
Keywords: bandwidth, memory hierarchy, memory model, power consumption, throughput computing
Also published in:
June 2011  ACM SIGARCH Computer Architecture News - ISCA '11: Volume 39 Issue 3, June 2011

8 published by ACM
May 2011 ICSE '11: Proceedings of the 33rd International Conference on Software Engineering
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 4,   Downloads (12 Months): 11,   Downloads (Overall): 230

Full text available: PDFPDF
With the ubiquity of multi-core processors, software must make effective use of multiple cores to obtain good performance on modern hardware. One of the biggest roadblocks to this is load imbalance , or the uneven distribution of work across cores. We propose LIME, a framework for analyzing parallel programs and ...
Keywords: performance debugging, load imbalance, parallel section

9
November 2010 IEEE Micro: Volume 30 Issue 6, November 2010
Publisher: IEEE Computer Society Press
Bibliometrics:
Citation Count: 4

Processors that target throughput computing often have many cores, which stresses the cache hierarchy. Logically centralized, shared data storage is needed for many-core chips to provide high cache throughput for heavily read-write shared lines. Techniques to reduce on-die and off-die traffic have a dramatic energy benefit for many-core chips.
Keywords: throughput computing, memory hierarchy, graphics processors, multicore/single-chip multiprocessors, memory hierarchy, graphics processors, throughput computing, multicore/single-chip multiprocessors

10
June 2009 ICME'09: Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Publisher: IEEE Press
Bibliometrics:
Citation Count: 2

Parallel scalability allows an application to efficiently utilize an increasing number of processing elements. In this paper we explore a design space for application scalability for an inference engine in large vocabulary continuous speech recognition (LVCSR). Our implementation of the inference engine involves a parallel graph traversal through an irregular ...

11
June 2008 ISCA '08: Proceedings of the 35th Annual International Symposium on Computer Architecture
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 11
Downloads (6 Weeks): 1,   Downloads (12 Months): 7,   Downloads (Overall): 763

Full text available: PDFPDF
The current trend is for processors to deliver dramatic improvements in parallel performance while only modestly improving serial performance. Parallel performance is harvested through vector/SIMD instructions as well as multithreading (through both multithreaded cores and chip multiprocessors). Vector parallelism can be more efficiently supported than multithreading, but is often harder ...
Keywords: locks, reductions, vector, SIMD, multiprocessors
Also published in:
June 2008  ACM SIGARCH Computer Architecture News: Volume 36 Issue 3, June 2008

12 published by ACM
June 2007 ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
Publisher: ACM
Bibliometrics:
Citation Count: 60
Downloads (6 Weeks): 11,   Downloads (12 Months): 64,   Downloads (Overall): 1,692

Full text available: PDFPDF
Chip multiprocessors (CMPs) are now commonplace, and the number of cores on a CMP is likely to grow steadily. However, in order to harness the additional compute resources of a CMP, applications must expose their thread-level parallelism to the hardware. One common approach to doing this is to decompose a ...
Keywords: CMP, architectural support, loop and task parallelism
Also published in:
June 2007  ACM SIGARCH Computer Architecture News: Volume 35 Issue 2, May 2007

13 published by ACM
June 2007 ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
Publisher: ACM
Bibliometrics:
Citation Count: 15
Downloads (6 Weeks): 1,   Downloads (12 Months): 17,   Downloads (Overall): 829

Full text available: PDFPDF
We explore the emerging application area of physics-based simulation for computer animation and visual special effects. In particular, we examine its parallelization potential and characterize its behavior on a chip multiprocessor (CMP). Applications in this domain model and simulate natural phenomena, and often direct visual components of motion pictures. We ...
Keywords: CMP, characterization, parallelization, physical simulation
Also published in:
June 2007  ACM SIGARCH Computer Architecture News: Volume 35 Issue 2, May 2007

14 published by ACM
August 2006 KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 1,   Downloads (12 Months): 20,   Downloads (Overall): 754

Full text available: PDFPDF
Traditional decomposition-based solutions to Support Vector Machines (SVMs) suffer from the widely-known scalability problem. For example, given a one-million training set, it takes about six days for SVMLight to run on a Pentium-4 sever with 8G-byte memory. In this paper, we propose an incremental algorithm, which performs approximate matrix-factorization operations, ...
Keywords: matrix factorization, support vector machines, interior-point method

15 published by ACM
March 2006 PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Publisher: ACM
Bibliometrics:
Citation Count: 74
Downloads (6 Weeks): 6,   Downloads (12 Months): 30,   Downloads (Overall): 1,280

Full text available: PDFPDF
High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they have a number of disadvantages, including possibly unnecessary serialization, and possible deadlock. Transactional memory ...
Keywords: architecture support, nonblocking, transactional memory, transactions

16
April 2005 Journal of Parallel and Distributed Computing: Volume 65 Issue 4, April 2005
Publisher: Academic Press, Inc.
Bibliometrics:
Citation Count: 9

This paper studies a memory-side prefetching technique to hide latency incurred by inherently serial accesses to linked data structures (LDS). A programmable engine sits close to memory and traverses LDS independently from the processor. The engine can run ahead of the processor because of its low latency path to memory, ...
Keywords: Prefetching, Processor-in-memory, Linked data structures

17 published by ACM
March 2004 ACM SIGARCH Computer Architecture News - ISCA 2004: Volume 32 Issue 2, March 2004
Publisher: ACM
Bibliometrics:
Citation Count: 12
Downloads (6 Weeks): 1,   Downloads (12 Months): 5,   Downloads (Overall): 454

Full text available: PDFPDF
Much research has recently been done on adapting architecturalresources of general-purpose processors to saveenergy at the cost of increased execution time. This workexamines adaptation control algorithms for such processorsrunning real-time multimedia applications. The best previousalgorithms are mostly heuristics-based and ad hoc, requiringan impractically large amount of application- andresource-specific tuning.We take ...
Also published in:
June 2004  ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture

18
January 2003
Bibliometrics:
Citation Count: 1

Real-time multimedia applications, already a key workload for many computer systems, are increasingly important. The increasing complexity and performance demands of these applications motivates the use of general-purpose processors for them. However, multimedia applications introduce a number of new challenges to general-purpose processor design because of their real-time nature and ...

19
December 2002 RTSS '02: Proceedings of the 23rd IEEE Real-Time Systems Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 25

Simultaneous multithreading (SMT) improves processor throughput by processing instructions from multiple threads each cycle. This is the first work to explore soft real-time scheduling on an SMT processol: Scheduling with SMT requires two decisions: ( 1) which threads to run simultaneously (the co-schedule), and (2) how to share processor resources ...

20 published by ACM
October 2002 ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Publisher: ACM
Bibliometrics:
Citation Count: 25
Downloads (6 Weeks): 2,   Downloads (12 Months): 8,   Downloads (Overall): 620

Full text available: PDFPDF
This work concerns algorithms to control energy-driven architecture adaptations for multimedia applications, without and with dynamic voltage scaling (DVS). We identify a broad design space for adaptation control algorithms based on two attributes: (1) when to adapt or temporal granularity and (2) what structures to adapt or spatial granularity. For ...
Also published in:
October 2002  ACM SIGPLAN Notices: Volume 37 Issue 10, October 2002 December 2002  ACM SIGARCH Computer Architecture News - Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems: Volume 30 Issue 5, December 2002 December 2002  ACM SIGOPS Operating Systems Review: Volume 36 Issue 5, December 2002



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us