ABSTRACT
AUTHORS
|
|
|||||||||||||||||||||||||||||||||||||||
| View colleagues of Luiz André Barroso | ||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Kourosh Gharachorloo | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Robert McNamara | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Andreas Nowatzyk | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Shaz Qadeer | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Barton Sano | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Scott Smith | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Robert Stets | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Ben Verghese | |||||||||||||||||||||||||||||||||||||||||
REFERENCESNote: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
|
2
|
P. Bannon. Alpha 21364: A Scalable Single-chip SMP. Presented at the Microprocessor Forum '98 (http://www.digital.com/alphaoem/microprocessorforum.htm), October 1998.
|
|
|
3
|
L.A. Barroso, K. Gharachorloo, A. Nowatzyk, and B. Verghese. Impact of Chip-Level Integration on Performance of OLTP Workloads. In 6th International Symposium on High-Performance Computer Architecture, pages 3-14, January 2000.
|
|
| |
4
|
|
|
5
|
J. Borkenhagen and S. Storino. 5th Generation 64-bit PowerPC-Compatible Commercial Processor Design. http://www.rs6OOO.ibm.com /resource/technology/pulsar.pdf. September 1999.
|
|
|
6
|
S. Crowder et al. IEDM Technical Digest, page 1017, 1998.
|
|
| |
7
|
|
|
8
|
Z. Cvetanovic and D. Donaldson. AlphaServer 4100 Performance Characterization. In Digital Technical Journal, 8(4), pages 3-20, 1996.
|
|
|
9
|
K. Diefendorff. Power4 Focuses on Memory Bandwidth: IBM Confronts IA-64, Says ISA Not Important. In Microprocessor Report, Vol. 13, No. 13, October 1999.
|
|
|
10
|
Digital Equipment Corporation. Digital Semiconductor 21164 Alpha Microprocessor Hardware Reference Manual. March 1996.
|
|
|
11
|
||
| |
12
|
Richard J. Eickemeyer , Ross E. Johnson , Steven R. Kunkel , Mark S. Squillante , Shiafun Liu, Evaluation of multithreaded uniprocessors for commercial application environments, Proceedings of the 23rd annual international symposium on Computer architecture, p.203-212, May 22-24, 1996, Philadelphia, Pennsylvania, USA [doi>10.1145/232973.232994]
|
|
13
|
J.S. Emer. Simultaneous Multithreading: Multiplying Alpha's Performance. Presentation at the Microprocessor Forum '99, October 1999.
|
|
|
14
|
A. Gupta, W.-D. Weber, and T. Mowry. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In International Conference on Parallel Processing, July 1990.
|
|
|
15
|
||
| |
16
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, USA [doi>10.1145/291069.291020]
|
|
17
|
L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Willey, M. Chen, M. Kozyrczak, and K. Olukotun. The Stanford Hydra CMP. Presented at Hot Chips 11, August 1999.
|
|
|
18
|
||
|
19
|
IBM Microelectronics. ASIC SA27E Databook. International Business Machines, 1999.
|
|
| |
20
|
|
| |
21
|
Kimberly Keeton , David A. Patterson , Yong Qiang He , Roger C. Raphael , Walter E. Baker, Performance characterization of a Quad Pentium Pro SMP using OLTP workloads, Proceedings of the 25th annual international symposium on Computer architecture, p.15-26, June 27-July 02, 1998, Barcelona, Spain [doi>10.1145/279358.279364]
|
| |
22
|
|
|
23
|
||
| |
24
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, Proceedings of the 21st annual international symposium on Computer architecture, p.302-313, April 18-21, 1994, Chicago, Illinois, USA [doi>10.1145/191995.192056]
|
| |
25
|
|
| |
26
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor, Proceedings of the 17th annual international symposium on Computer Architecture, p.148-159, May 28-31, 1990, Seattle, Washington, USA [doi>10.1145/325164.325132]
|
| |
27
|
Jack L. Lo , Luiz André Barroso , Susan J. Eggers , Kourosh Gharachorloo , Henry M. Levy , Sujay S. Parekh, An analysis of database workload performance on simultaneous multithreaded processors, Proceedings of the 25th annual international symposium on Computer architecture, p.39-50, June 27-July 02, 1998, Barcelona, Spain [doi>10.1145/279358.279367]
|
| |
28
|
Ann Marie Grizzaffi Maynard , Colette M. Donnelly , Bret R. Olszewski, Contrasting characteristics and cache performance of technical and multi-user commercial workloads, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.145-156, October 05-07, 1994, San Jose, California, USA [doi>10.1145/195473.195524]
|
| |
29
|
|
| |
30
|
Andreas G. Nowatzyk , Michael C. Browne , Edmund J. Kelly , Michael Parkin, S-connect: from networks of workstations to supercomputer performance, Proceedings of the 22nd annual international symposium on Computer architecture, p.71-82, June 22-24, 1995, S. Margherita Ligure, Italy [doi>10.1145/223982.224004]
|
|
31
|
A. Nowatzyk, G. Aybay, M. Browne, E. Kelly, M. Parkin, W. Radke, and S. Vishin. The S3.mp Scalable Shared Memory Multiprocessor. In International Conference on Parallel Processing (ICPP' 95), pages 1.1 - 1.10, July 1995.
|
|
|
32
|
||
| |
33
|
Kunle Olukotun , Basem A. Nayfeh , Lance Hammond , Ken Wilson , Kunyung Chang, The case for a single-chip multiprocessor, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.2-11, October 01-04, 1996, Cambridge, Massachusetts, USA [doi>10.1145/237090.237140]
|
| |
34
|
|
| |
35
|
Parthasarathy Ranganathan , Kourosh Gharachorloo , Sarita V. Adve , Luiz André Barroso, Performance of database workloads on shared-memory systems with out-of-order processors, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.307-318, October 02-07, 1998, San Jose, California, USA [doi>10.1145/291069.291067]
|
| |
36
|
M. Rosenblum , E. Bugnion , S. A. Herrod , E. Witchel , A. Gupta, The impact of architectural trends on operating system performance, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.285-298, December 03-06, 1995, Copper Mountain, Colorado, USA [doi>10.1145/224056.224078]
|
| |
37
|
|
| |
38
|
|
|
39
|
||
|
40
|
Standard Performance Council. The SPEC95 CPU Benchmark Suite. http ://www.specbench.org, 1995.
|
|
|
41
|
||
| |
42
|
|
|
43
|
Transaction Processing Performance Council. TPC Benchmark B Standard Specification Revision 2.0. June 1994.
|
|
|
44
|
Transaction Processing Performance Council. TPC Benchmark D (Decision Support) Standard Specification Revision 1.2. November 1996.
|
|
|
45
|
Transaction Processing Performance Council. TPC Benchmark C, Standard Specification Revision 3.6, October 1999.
|
|
|
46
|
||
|
47
|
M. Tremblay. MAJC-5200: A VLIW Convergent MPSOC. In Microprocessor Forum, October 1999.
|
|
| |
48
|
CITED BY192 Citations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMSThe ACM Computing Classification System (CCS rev.2012)
PUBLICATION| · Proceeding | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Title | ISCA '00 Proceedings of the 27th annual international symposium on Computer architecture table of contents | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Chairmen | Alan Berenbaum Lucent Technologies | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Joel Emer Compaq Computer Corp. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Pages | 282-293 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Publication Date | 2000-06-10 (yyyy-mm-dd) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sponsor | SIGARCH ACM Special Interest Group on Computer Architecture | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Publisher | ACM New York, NY, USA ©2000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ISBN: 1-58113-232-8 Order Number: 415004 doi>10.1145/339647.339696 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Conference |
ISCAInternational Symposium on Computer Architecture
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Overall Acceptance Rate 533 of 2,983 submissions, 18% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| · Newsletter | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Title | ACM SIGARCH Computer Architecture News - Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00) Homepage table of contents archive | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Volume 28 Issue 2, May 2000 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Chairmen | Alan Berenbaum Lucent Technologies, Berkeley Heights, NJ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Joel Emer Compaq Computer Corp., Palo Alto, CA | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Pages | 282-293 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Publication Date | 2000-05-01 (yyyy-mm-dd) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sponsor | SIGARCH ACM Special Interest Group on Computer Architecture | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Publisher | ACM New York, NY, USA | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ISSN: 0163-5964 doi>10.1145/342001.339696 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
REVIEWS
COMMENTSBe the first to comment To Post a comment please sign in or create a free Web account
Table of Contents| A scalable approach to thread-level speculation | |
| J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry | |
| Pages: 1-12 | |
| doi>10.1145/339647.339650 | |
Full text: PDF
|
|
|
While architects understand how to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how to easily create parallel software ...
expand
|
|
| Architectural support for scalable speculative parallelization in shared-memory multiprocessors | |
| Marcelo Cintra, José F. Martínez, Josep Torrellas | |
| Pages: 13-24 | |
| doi>10.1145/339647.363382 | |
Full text: PDF
|
|
|
Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited ...
expand
|
|
| Transient fault detection via simultaneous multithreading | |
| Steven K. Reinhardt, Shubhendu S. Mukherjee | |
| Pages: 25-36 | |
| doi>10.1145/339647.339652 | |
Full text: PDF
|
|
|
Smaller feature sizes, reduced voltage levels, higher transistor counts, and reduced noise margins make future generations of microprocessors increasingly prone to transient hardware faults. Most commercial fault-tolerant computers use fully ...
expand
|
|
| Trace preconstruction | |
| Quinn Jacobson, James E. Smith | |
| Pages: 37-46 | |
| doi>10.1145/339647.339653 | |
Full text: PDF
|
|
|
Trace caches enable high bandwidth, low latency instruction supply, but have a high miss penalty and relatively large working sets. Consequently, their performance may suffer due to capacity and compulsory misses. Trace preconstruction augments a trace ...
expand
|
|
| Completion time multiple branch prediction for enhancing trace cache performance | |
| Ryan Rakvic, Bryan Black, John Paul Shen | |
| Pages: 47-58 | |
| doi>10.1145/339647.339654 | |
Full text: PDF
|
|
|
The need for multiple branch prediction is inherent to wide instruction fetching. This paper presents a completion time multiple branch predictor called the Tree-based Multiple Branch Predictor (TMP) that builds on previous single branch prediction ...
expand
|
|
| A hardware mechanism for dynamic extraction and relayout of program hot spots | |
| Matthew C. Merten, Andrew R. Trick, Erik M. Nystrom, Ronald D. Barnes, Wen-mei W. Hmu | |
| Pages: 59-70 | |
| doi>10.1145/339647.339655 | |
Full text: PDF
|
|
|
This paper presents a new mechanism for collecting and deploying runtime optimized code. The code-collecting component resides in the instruction retirement stage and lays out hot execution paths to improve instruction fetch rate as well as enable further ...
expand
|
|
| HLS: combining statistical and symbolic simulation to guide microprocessor designs | |
| Mark Oskin, Frederic T. Chong, Matthew Farrens | |
| Pages: 71-82 | |
| doi>10.1145/339647.339656 | |
Full text: PDF
|
|
|
As microprocessors continue to evolve, many optimizations reach a point of diminishing returns. We introduce HLS, a hybrid processor simulator which uses statistical models and symbolic execution to evaluate design alternatives. This simulation ...
expand
|
|
| Wattch: a framework for architectural-level power analysis and optimizations | |
| David Brooks, Vivek Tiwari, Margaret Martonosi | |
| Pages: 83-94 | |
| doi>10.1145/339647.339657 | |
Full text: PDF
|
|
|
Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most ...
expand
|
|
| Energy-driven integrated hardware-software optimizations using SimplePower | |
| N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. S. Kim, W. Ye | |
| Pages: 95-106 | |
| doi>10.1145/339647.339659 | |
Full text: PDF
|
|
|
With the emergence of a plethora of embedded and portable applications, energy dissipation has joined throughput, area, and accuracy/precision as a major design constraint. Thus, designers must be concerned with both optimizing and estimating the energy ...
expand
|
|
| A fully associative software-managed cache design | |
| Erik G. Hallnor, Steven K. Reinhardt | |
| Pages: 107-116 | |
| doi>10.1145/339647.339660 | |
Full text: PDF
|
|
|
As DRAM access latencies approach a thousand instruction-execution times and on-chip caches grow to multiple megabytes, it is not clear that conventional cache structures continue to be appropriate. Two key features—full associativity and software ...
expand
|
|
| Recency-based TLB preloading | |
| Ashley Saulsbury, Fredrik Dahlgren, Per Stenström | |
| Pages: 117-127 | |
| doi>10.1145/339647.339666 | |
Full text: PDF
|
|
|
Caching and other latency tolerating techniques have been quite successful in maintaining high memory system performance for general purpose processors. However, TLB misses have become a serious bottleneck as working sets are growing beyond the ...
expand
|
|
| Memory access scheduling | |
| Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, John D. Owens | |
| Pages: 128-138 | |
| doi>10.1145/339647.339668 | |
Full text: PDF
|
|
|
The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the “3-D” structure of banks, rows, and columns characteristic of contemporary DRAM chips. There is nearly an order ...
expand
|
|
| Selective, accurate, and timely self-invalidation using last-touch prediction | |
| An-Chow Lai, Babak Falsafi | |
| Pages: 139-148 | |
| doi>10.1145/339647.339669 | |
Full text: PDF
|
|
|
Communication in cache-coherent distributed shared memory (DSM) often requires invalidating (or writing back) cached copies of a memory block, incurring high overheads. This paper proposes Last-Touch Predictors (LTPs) that learn ...
expand
|
|
| An embedded DRAM architecture for large-scale spatial-lattice computations | |
| Norman Margolus | |
| Pages: 149-160 | |
| doi>10.1145/339647.339672 | |
Full text: PDF
|
|
|
Spatial-lattice computations with finite-range interactions are an important class of easily parallelized computations. This class includes many simple and direct algorithms for physical simulation, virtual-reality simulation, agent-based modeling, logic ...
expand
|
|
| Smart Memories: a modular reconfigurable architecture | |
| Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, Mark Horowitz | |
| Pages: 161-171 | |
| doi>10.1145/339647.339673 | |
Full text: PDF
|
|
|
Trends in VLSI technology scaling demand that future computing devices be narrowly focused to achieve high performance and high efficiency, yet also target the high volumes and low costs of widely applicable general purpose designs. To address these ...
expand
|
|
| Understanding the backward slices of performance degrading instructions | |
| Craig B. Zilles, Gurindar S. Sohi | |
| Pages: 172-181 | |
| doi>10.1145/339647.339676 | |
Full text: PDF
|
|
|
For many applications, branch mispredictions and cache misses limit a processor's performance to a level well below its peak instruction throughput. A small fraction of static instructions, whose behavior cannot be anticipated using current branch ...
expand
|
|
| On the value locality of store instructions | |
| Kevin M. Lepak, Mikko H. Lipasti | |
| Pages: 182-191 | |
| doi>10.1145/339647.339678 | |
Full text: PDF
|
|
|
Value locality, a recently discovered program attribute that describes the likelihood of the recurrence of previously-seen program values, has been studied enthusiastically in the recent published literature. Much of the energy has focused on refining ...
expand
|
|
| Performance analysis of the Alpha 21264-based Compaq ES40 system | |
| Zarka Cvetanovic, R. E. Kessler | |
| Pages: 192-202 | |
| doi>10.1145/339647.339680 | |
Full text: PDF
|
|
|
This paper evaluates performance characteristics of the Compaq ES40 shared memory multiprocessor. The ES40 system contains up to four Alpha 21264 CPU's together with a high-performance memory system. We qualitatively describe architectural features ...
expand
|
|
| Lx: a technology platform for customizable VLIW embedded processing | |
| Paolo Faraboschi, Geoffrey Brown, Joseph A. Fisher, Giuseppe Desoli, Fred Homewood | |
| Pages: 203-213 | |
| doi>10.1145/339647.339682 | |
Full text: PDF
|
|
|
Lx is a scalable and customizable VLIW processor technology platform designed by Hewlett-Packard and STMicroelectronics that allows variations in instruction issue width, the number and capabilities of structures and the processor instruction set. For ...
expand
|
|
| Reconfigurable caches and their application to media processing | |
| Parthasarathy Ranganathan, Sarita Adve, Norman P. Jouppi | |
| Pages: 214-224 | |
| doi>10.1145/339647.339685 | |
Full text: PDF
|
|
|
High performance general-purpose processors are increasingly being used for a variety of application domains - scientific, engineering, databases, and more recently, media processing. It is therefore important to ensure that architectural features ...
expand
|
|
| CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit | |
| Zhi Alex Ye, Andreas Moshovos, Scott Hauck, Prithviraj Banerjee | |
| Pages: 225-235 | |
| doi>10.1145/339647.339687 | |
Full text: PDF
|
|
|
Reconfigurable hardware has the potential for significant performance improvements by providing support for application-specific operations. We report our experience with Chimaera, a prototype system that integrates a small and fast reconfigurable ...
expand
|
|
| Circuits for wide-window superscalar processors | |
| Dana S. Henry, Bradley C. Kuszmaul, Gabriel H. Loh, Rahul Sami | |
| Pages: 236-247 | |
| doi>10.1145/339647.339689 | |
Full text: PDF
|
|
|
Our program benchmarks and simulations of novel circuits indicate that large-window processors are feasible. Using our redesigned superscalar components, a large-window processor implemented in today's technology can achieve an increase of 10-60% (geometric ...
expand
|
|
| Clock rate versus IPC: the end of the road for conventional microarchitectures | |
| Vikas Agarwal, M. S. Hrishikesh, Stephen W. Keckler, Doug Burger | |
| Pages: 248-259 | |
| doi>10.1145/339647.339691 | |
Full text: PDF
|
|
|
The doubling of microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scali ng of the processor clock with technology generation. Our results show that, due to both diminishing ...
expand
|
|
| Vector instruction set support for conditional operations | |
| J. E. Smith, Greg Faanes, Rabin Sugumar | |
| Pages: 260-269 | |
| doi>10.1145/339647.339693 | |
Full text: PDF
|
|
|
Vector instruction sets are receiving renewed interest because of their applicability to multimedia. Current multimedia instruction sets use short vectors with SIMD implementations, but long vector, pipelined implementations have a number of advantages ...
expand
|
|
| Instruction path coprocessors | |
| Yuan Chou, John Paul Shen | |
| Pages: 270-281 | |
| doi>10.1145/339647.339694 | |
Full text: PDF
|
|
|
This paper presents the concept of an Instruction Path Coprocessor (I-COP), which is a programmable on-chip coprocessor, with its own mini-instruction set, that operates on the core processor's instructions to transform them into an internal ...
expand
|
|
| Piranha: a scalable architecture based on single-chip multiprocessing | |
| Luiz André Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, Ben Verghese | |
| Pages: 282-293 | |
| doi>10.1145/339647.339696 | |
Full text: PDF
|
|
|
The microprocessor industry is currently struggling with higher development costs and longer design times that arise from exceedingly complex processors that are pushing the limits of instruction-level parallelism. Meanwhile, such designs are especially ...
expand
|
|
| Allowing for ILP in an embedded Java processor | |
| Ramesh Radhakrishnan, Deependra Talla, Lizy Kurian John | |
| Pages: 294-305 | |
| doi>10.1145/339647.339702 | |
Full text: PDF
|
|
|
Java processors are ideal for embedded and network computing applications such as Internet TV's, set-top boxes, smart phones, and other consumer electronics applications. In this paper, we investigate cost-effective microarchitectural techniques ...
expand
|
|
| Early load address resolution via register tracking | |
| Michael Bekerman, Adi Yoaz, Freddy Gabbay, Stephan Jourdan, Maxim Kalaev, Ronny Ronen | |
| Pages: 306-315 | |
| doi>10.1145/339647.339705 | |
Full text: PDF
|
|
|
Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intel's IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, ...
expand
|
|
| Multiple-banked register file architectures | |
| José-Lorenzo Cruz, Antonio González, Mateo Valero, Nigel P. Topham | |
| Pages: 316-325 | |
| doi>10.1145/339647.339708 | |
Full text: PDF
|
|
|
The register file access time is one of the critical delays in current superscalar processors. Its impact on processor performance is likely to increase in future processor generations, as they are expected to increase the issue width (which implies ...
expand
|