Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

Recently, a number of thread-based prefetching techniques have been proposed. These techniques aim at improving the latency of single-threaded applications by leveraging multithreading resources to perform memory prefetching via speculative prefetch threads. Software-based speculative precomputation (SSP) is one such technique, proposed for multithreaded Itanium models. SSP does not require expensive hardware support-instead it relies on the compiler to adapt binaries to perform prefetching on otherwise idle hardware thread contexts at run time. This paper presents a post-pass compilation tool for generating SSP-enhanced binaries. The tool is able to: (1) analyze a single-threaded application to generate prefetch threads; (2) identify and embed trigger points in the original binary; and (3) produce a new binary that has the prefetch threads attached. The execution of the new binary spawns the speculative prefetch threads, which are executed concurrently with the main thread. Our results indicate that for a set of pointer-intensive benchmarks, the prefetching performed by the speculative threads achieves an average of 87% speedup on an in-order processor and 5% speedup on an out-of-order processor.
Advertisements



top of pageAUTHORS



Author image not provided  Steve S.W. Liao

No contact information provided yet.

Bibliometrics: publication history
Publication years2002-2004
Publication count2
Citation Count76
Available for download2
Downloads (6 Weeks)1
Downloads (12 Months)27
Downloads (cumulative)1,029
Average downloads per article514.50
Average citations per article38.00
View colleagues of Steve S.W. Liao


Author image not provided  Perry H. Wang

No contact information provided yet.

Bibliometrics: publication history
Publication years2002-2011
Publication count12
Citation Count192
Available for download9
Downloads (6 Weeks)20
Downloads (12 Months)209
Downloads (cumulative)12,147
Average downloads per article1,349.67
Average citations per article16.00
View colleagues of Perry H. Wang


Author image not provided  Hong Wang

No contact information provided yet.

Bibliometrics: publication history
Publication years1995-2011
Publication count27
Citation Count585
Available for download20
Downloads (6 Weeks)44
Downloads (12 Months)563
Downloads (cumulative)20,367
Average downloads per article1,018.35
Average citations per article21.67
View colleagues of Hong Wang


Author image not provided  Gerolf Hoflehner

No contact information provided yet.

Bibliometrics: publication history
Publication years2000-2011
Publication count9
Citation Count77
Available for download5
Downloads (6 Weeks)5
Downloads (12 Months)37
Downloads (cumulative)1,916
Average downloads per article383.20
Average citations per article8.56
View colleagues of Gerolf Hoflehner


Author image not provided  Daniel Lavery

No contact information provided yet.

Bibliometrics: publication history
Publication years1992-2009
Publication count17
Citation Count499
Available for download9
Downloads (6 Weeks)4
Downloads (12 Months)92
Downloads (cumulative)4,694
Average downloads per article521.56
Average citations per article29.35
View colleagues of Daniel Lavery


Author image not provided  John P. Shen

No contact information provided yet.

Bibliometrics: publication history
Publication years2001-2008
Publication count17
Citation Count545
Available for download11
Downloads (6 Weeks)20
Downloads (12 Months)215
Downloads (cumulative)9,849
Average downloads per article895.36
Average citations per article32.06
View colleagues of John P. Shen

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
T. Aamodt, A. Moshovos, and P. Chow. The Predictability of Computations that Produce Unpredictable Outcomes. In 5th Workshop on Multithreaded Execution, Architecture and Compilation, pp. 23-34, Austin, Texas, December 2001
2
3
 
4
 
5
 
6
7
 
8
K. Cooper, P. Schielke, D. Subramanian. An Experimental Evaluation of List Scheduling. Rice University Technical Report 98-326, September 1998
 
9
 
10
J. Emer. Simultaneous Multithreading: Multiplying Alpha's Performance. In Microprocessor Forum, October 1999
11
12
13
 
14
 
15
G. Hinton and J. Shen. Intel's Multi-Threading Technology. In Microprocessor Forum, October 2001
 
16
 
17
18
 
19
S. Liao. SUIF Explorer. Ph.D. thesis, Stanford University, August 2000, Stanford Technical Report CSL-TR-00-807
20
21
 
22
D. Marr, F. Binns, D. Hill, G. Hinton, D. Koufaty, J. Miller, M. Upton. Hyper-Threading Technology Architecture and Microarchitecture. In Intel Technology Journal, Volume 6, Issue on Hyper-threading, February 2002
23
 
24
 
25
 
26
 
27
D. M. Tullsen. Simulation and Modeling of a Simultaneous Multithreaded Processor. In 22nd Annual Computer Measurement Group Conference, December 1996
28
 
29
R. Uhlig, R. Rishtein, O. Gershon, I. Hirsh, and H. Wang. SoftSDV: A Presilicon Software Development Environment for the IA-64 Architecture. In Intel Technology Journal, Q4 1999
 
30
H. Wang, P. Wang, R. D. Weldon, S. Ettinger, H. Saito, M. Girkar, S. Liao, J. Shen. Speculative Precomputation: Exploring Use of Multithreading Technology for Latency. In Intel Technology Journal, Volume 6, Issue on Hyper-threading, February 2002
 
31
 
32
M. Weiser. Program Slicing. In IEEE Transactions on Software Engineering, 10(4), pp. 352-357, 1984
33
34

top of pageCITED BY

44 Citations

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

· Proceeding
Title PLDI '02 Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementation table of contents
General Chairs Jens Knoop Universität Dortmund, Germany
Program Chairs Laurie J. Hendren McGill University, Canada
Pages 117-128
Publication Date2002-06-17 (yyyy-mm-dd)
Sponsor SIGPLAN ACM Special Interest Group on Programming Languages
PublisherACM New York, NY, USA ©2002
ISBN: 1-58113-463-0 Order Number: 548020 doi>10.1145/512529.512544
Conference PLDIProgramming Language Design and Implementation PLDI logo
Paper Acceptance Rate 28 of 169 submissions, 17%
Overall Acceptance Rate 711 of 3,503 submissions, 20%
Year Submitted Accepted Rate
PLDI '95 105 28 27%
PLDI '96 112 28 25%
PLDI '97 158 31 20%
PLDI '98 136 31 23%
PLDI '99 130 26 20%
PLDI '00 173 30 17%
PLDI '01 144 30 21%
PLDI '02 169 28 17%
PLDI '03 131 28 21%
PLDI '04 127 25 20%
PLDI '05 135 28 21%
PLDI '06 174 36 21%
PLDI '07 178 45 25%
PLDI '08 184 34 18%
PLDI '09 196 41 21%
PLDI '10 206 41 20%
PLDI '11 236 55 23%
PLDI '12 255 48 19%
PLDI '13 267 46 17%
PLDI '14 287 52 18%
Overall 3,503 711 20%
· Newsletter
Title ACM SIGPLAN Notices Homepage table of contents archive
Volume 37 Issue 5, May 2002
Pages 117-128
Publication Date2002-05-17 (yyyy-mm-dd)
Sponsor SIGPLAN ACM Special Interest Group on Programming Languages
PublisherACM New York, NY, USA
ISSN: 0362-1340 EISSN: 1558-1160 doi>10.1145/543552.512544

APPEARS IN
Software
Software
Performance
Performance

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementation
Table of Contents
SESSION: Type Systems
Oege de Moor
Flow-sensitive type qualifiers
Jeffrey S. Foster, Tachio Terauchi, Alex Aiken
Pages: 1-12
doi>10.1145/512529.512531
Full text: PDFPDF

We present a system for extending standard type systems with flow-sensitive type qualifiers. Users annotate their programs with type qualifiers, and inference checks that the annotations are correct. In our system only the type qualifiers are modeled ...
expand
Adoption and focus: practical linear types for imperative programming
Manuel Fahndrich, Robert DeLine
Pages: 13-24
doi>10.1145/512529.512532
Full text: PDFPDF

A type system with linearity is useful for checking software protocols andresource management at compile time. Linearity provides powerful reasoning about state changes, but at the price of restrictions on aliasing. The hard division between linear and ...
expand
SESSION: Register Allocation and Value Numbering
Rajiv Gupta
Fast copy coalescing and live-range identification
Zoran Budimlic, Keith D. Cooper, Timothy J. Harvey, Ken Kennedy, Timothy S. Oberg, Steven W. Reeves
Pages: 25-32
doi>10.1145/512529.512534
Full text: PDFPDF

This paper presents a fast new algorithm for modeling and reasoning about interferences for variables in a program without constructing an interference graph. It then describes how to use this information to minimize copy insertion for &fgr;-node instantiation ...
expand
Preference-directed graph coloring
Akira Koseki, Hideaki Komatsu, Toshio Nakatani
Pages: 33-44
doi>10.1145/512529.512535
Full text: PDFPDF

This paper describes a new framework of register allocation based on Chaitin-style coloring. Our focus is on maximizing the chances for live ranges to be allocated to the most preferred registers while not destroying the colorability obtained by graph ...
expand
A sparse algorithm for predicated global value numbering
Karthik Gargi
Pages: 45-56
doi>10.1145/512529.512536
Full text: PDFPDF

This paper presents a new algorithm for performing global value numbering on a routine in static single assignment form. Our algorithm has all the strengths of the most powerful existing practical methods of global value numbering; it unifies optimistic ...
expand
SESSION: Program Correctness
Rita Loogen
ESP: path-sensitive program verification in polynomial time
Manuvir Das, Sorin Lerner, Mark Seigle
Pages: 57-68
doi>10.1145/512529.512538
Full text: PDFPDF

In this paper, we present a new algorithm for partial program verification that runs in polynomial time and space. We are interested in checking that a program satisfies a given temporal safety property. Our insight is that by accurately modeling only ...
expand
A system and language for building system-specific, static analyses
Seth Hallem, Benjamin Chelf, Yichen Xie, Dawson Engler
Pages: 69-82
doi>10.1145/512529.512539
Full text: PDFPDF

This paper presents a novel approach to bug-finding analysis and an implementation of that approach. Our goal is to find as many serious bugs as possible. To do so, we designed a flexible, easy-to-use extension language for specifying analyses and an ...
expand
Deriving specialized program analyses for certifying component-client conformance
G. Ramalingam, Alex Warshavsky, John Field, Deepak Goyal, Mooly Sagiv
Pages: 83-94
doi>10.1145/512529.512540
Full text: PDFPDF

We are concerned with the problem of statically certifying (verifying) whether the client of a software component conforms to the component's constraints for correct usage. We show how conformance certification can be efficiently carried out in ...
expand
SESSION: Profiling and Speculation
Barbara Ryder
Profile-guided code compression
Saumya Debray, William Evans
Pages: 95-105
doi>10.1145/512529.512542
Full text: PDFPDF

As computers are increasingly used in contexts where the amount of available memory is limited, it becomes important to devise techniques that reduce the memory footprint of application programs while leaving them in an executable form. This paper describes ...
expand
Profile-directed optimization of event-based programs
Mohan Rajagopalan, Saumya K. Debray, Matti A. Hiltunen, Richard D. Schlichting
Pages: 106-116
doi>10.1145/512529.512543
Full text: PDFPDF

Events are used as a fundamental abstraction in programs ranging from graphical user interfaces (GUIs) to systems for building customized network protocols. While providing a flexible structuring and execution paradigm, events have the potentially serious ...
expand
Post-pass binary adaptation for software-based speculative precomputation
Steve S.W. Liao, Perry H. Wang, Hong Wang, Gerolf Hoflehner, Daniel Lavery, John P. Shen
Pages: 117-128
doi>10.1145/512529.512544
Full text: PDFPDF

Recently, a number of thread-based prefetching techniques have been proposed. These techniques aim at improving the latency of single-threaded applications by leveraging multithreading resources to perform memory prefetching via speculative prefetch ...
expand
SESSION: Garbage Collection
David Detlefs
A parallel, incremental and concurrent GC for servers
Yoav Ossia, Ori Ben-Yitzhak, Irit Goft, Elliot K. Kolodner, Victor Leikehman, Avi Owshanko
Pages: 129-140
doi>10.1145/512529.512546
Full text: PDFPDF

Multithreaded applications with multi-gigabyte heaps running on modern servers provide new challenges for garbage collection (GC). The challenges for "server-oriented" GC include: ensuring short pause times on a multi-gigabyte heap, while minimizing ...
expand
Combining region inference and garbage collection
Niels Hallenberg, Martin Elsman, Mads Tofte
Pages: 141-152
doi>10.1145/512529.512547
Full text: PDFPDF

This paper describes a memory discipline that combines region-based memory management and copying garbage collection by extending Cheney's copying garbage collection algorithm to work with regions. The paper presents empirical evidence that region inference ...
expand
Beltway: getting around garbage collection gridlock
Stephen M Blackburn, Richard Jones, Kathryn S. McKinley, J Eliot B Moss
Pages: 153-164
doi>10.1145/512529.512548
Full text: PDFPDF

We present the design and implementation of a new garbage collection framework that significantly generalizes existing copying collectors. The Beltway framework exploits and separates object age and incrementality. It groups objects in one or ...
expand
SESSION: Hardware-Conscious Optimizations
Yanhong Annie Liu
A compiler approach to fast hardware design space exploration in FPGA-based systems
Byoungro So, Mary W. Hall, Pedro C. Diniz
Pages: 165-176
doi>10.1145/512529.512550
Full text: PDFPDF

The current practice of mapping computations to custom hardware implementations requires programmers to assume the role of hardware designers. In tuning the performance of their hardware implementation, designers manually apply loop transformations such ...
expand
Space-time trade-off optimization for a class of electronic structure calculations
Daniel Cociorva, Gerald Baumgartner, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, Marcel Nooijen, David E. Bernholdt, Robert Harrison
Pages: 177-186
doi>10.1145/512529.512551
Full text: PDFPDF

The accurate modeling of the electronic structure of atoms and molecules is very computationally intensive. Many models of electronic structure, such as the Coupled Cluster approach, involve collections of tensor contractions. There are usually a large ...
expand
Effective sign extension elimination
Motohiro Kawahito, Hideaki Komatsu, Toshio Nakatani
Pages: 187-198
doi>10.1145/512529.512552
Full text: PDFPDF

Computer designs are shifting from 32-bit architectures to 64-bit architectures, while most of the programs available today are still designed for 32-bit architectures. Java™, for example, specifies the frequently used int" as a 32-bit data type. ...
expand
SESSION: Dynamic Prefetching & Cache Optimizations
Michal Cierniak
Dynamic hot data stream prefetching for general-purpose programs
Trishul M. Chilimbi, Martin Hirzel
Pages: 199-209
doi>10.1145/512529.512554
Full text: PDFPDF

Prefetching data ahead of use has the potential to tolerate the grow ing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisti cated prefetching techniques have been automated for limited ...
expand
Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching
Youfeng Wu
Pages: 210-221
doi>10.1145/512529.512555
Full text: PDFPDF

Irregular data references are difficult to prefetch, as the future memory address of a load instruction is hard to anticipate by a compiler. However, recent studies as well as our experience indicate that some important load instructions in irregular ...
expand
Static load classification for improving the value predictability of data-cache misses
Martin Burtscher, Amer Diwan, Matthias Hauswirth
Pages: 222-233
doi>10.1145/512529.512556
Full text: PDFPDF

While caches are effective at avoiding most main-memory accesses, the few remaining memory references are still expensive. Even one cache miss per one hundred accesses can double a program's execution time. To better tolerate the data-cache miss latency, ...
expand
SESSION: Analysis of Object-Oriented Programs
Jan Vitek
Extended static checking for Java
Cormac Flanagan, K. Rustan M. Leino, Mark Lillibridge, Greg Nelson, James B. Saxe, Raymie Stata
Pages: 234-245
doi>10.1145/512529.512558
Full text: PDFPDF

Software development and maintenance are costly endeavors. The cost can be reduced if more software defects are detected earlier in the development cycle. This paper introduces the Extended Static Checker for Java (ESC/Java), an experimental compile-time ...
expand
Using data groups to specify and check side effects
K. Rustan M. Leino, Arnd Poetzsch-Heffter, Yunhong Zhou
Pages: 246-257
doi>10.1145/512529.512559
Full text: PDFPDF

Reasoning precisely about the side effects of procedure calls is important to many program analyses. This paper introduces a technique for specifying and statically checking the side effects of methods in an object-oriented language. The technique uses ...
expand
Efficient and precise datarace detection for multithreaded object-oriented programs
Jong-Deok Choi, Keunwoo Lee, Alexey Loginov, Robert O'Callahan, Vivek Sarkar, Manu Sridharan
Pages: 258-269
doi>10.1145/512529.512560
Full text: PDFPDF

We present a novel approach to dynamic datarace detection for multithreaded object-oriented programs. Past techniques for on-the-fly datarace detection either sacrificed precision for performance, leading to many false positive datarace reports, or maintained ...
expand
SESSION: Language Design & Implementation Issues
Andrew C. Myers
Maya: multiple-dispatch syntax extension in Java
Jason Baker, Wilson C. Hsieh
Pages: 270-281
doi>10.1145/512529.512562
Full text: PDFPDF

We have designed and implemented Maya, a version of Java that allows programmers to extend and reinterpret its syntax. Maya generalizes macro systems by treating grammar productions as generic functions, and semantic actions on productions as multimethods ...
expand
Region-based memory management in cyclone
Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, James Cheney
Pages: 282-293
doi>10.1145/512529.512563
Full text: PDFPDF

Cyclone is a type-safe programming language derived from C. The primary design goal of Cyclone is to let programmers control data representation and memory management without sacrificing type-safety. In this paper, we focus on the region-based memory ...
expand
MaJIC: compiling MATLAB for speed and responsiveness
George Almási, David Padua
Pages: 294-303
doi>10.1145/512529.512564
Full text: PDFPDF

This paper presents and evaluates techniques to improve the execution performance of MATLAB. Previous efforts concentrated on source to source translation and batch compilation; MaJIC provides an interactive frontend that looks like MATLAB and ...
expand
SESSION: High Performance & Real-Time Issues
Charles Consel
Denali: a goal-directed superoptimizer
Rajeev Joshi, Greg Nelson, Keith Randall
Pages: 304-314
doi>10.1145/512529.512566
Full text: PDFPDF

This paper provides a preliminary report on a new research project that aims to construct a code generator that uses an automatic theorem prover to produce very high-quality (in fact, nearly mathematically optimal) machine code for modern architectures. ...
expand
The embedded machine: predictable, portable real-time code
Thomas A. Henzinger, Christoph M. Kirsch
Pages: 315-326
doi>10.1145/512529.512567
Full text: PDFPDF

The Embedded Machine is a virtual machine that mediates in real time the interaction between software processes and physical processes. It separates the compilation of embedded programs into two phases. The first, platform-independent compiler phase ...
expand

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us

Useful downloads: Adobe Reader    QuickTime    Windows Media Player    Real Player
Did you know the ACM DL App is now available?
Did you know your Organization can subscribe to the ACM Digital Library?
The ACM Guide to Computing Literature
All Tags
Export Formats
 
 
Save to Binder