Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On benign (load-balanced) traffic, the flattened butterfly approaches the cost/performance of a butterfly network and has roughly half the cost of a comparable performance Clos network.The advantage over the Clos is achieved by eliminating redundant hopswhen they are not needed for load balance. On adversarial traffic, the flattened butterfly matches the cost/performance of a folded-Clos network and provides an order of magnitude better performance than a conventional butterfly.In this case, global adaptive routing is used to switchthe flattened butterfly from minimal to non-minimal routing - usingredundant hops only when they are needed. Minimal and non-minimal, oblivious and adaptive routing algorithms are evaluated on the flattened butterfly.We show that load-balancing adversarial traffic requires non-minimalglobally-adaptive routing and show that sequential allocators are required to avoid transient load imbalance when using adaptive routing algorithms.We also compare the cost of the flattened butterfly to folded-Clos, hypercube,and butterfly networks with identical capacityand show that the flattened butterfly is more cost-efficient thanfolded-Clos and hypercube topologies.

Advertisements



top of pageAUTHORS



Author image not provided  John Kim

No contact information provided yet.

Bibliometrics: publication history
Publication years2005-2016
Publication count33
Citation Count598
Available for download24
Downloads (6 Weeks)94
Downloads (12 Months)1,334
Downloads (cumulative)13,726
Average downloads per article571.92
Average citations per article18.12
View colleagues of John Kim


Author image not provided  William J. Dally

No contact information provided yet.

Bibliometrics: publication history
Publication years1987-2013
Publication count48
Citation Count783
Available for download30
Downloads (6 Weeks)112
Downloads (12 Months)2,669
Downloads (cumulative)43,134
Average downloads per article1,437.80
Average citations per article16.31
View colleagues of William J. Dally


Dennis Abts Dennis Abts

homepage
dabtsatgoogle.com
Bibliometrics: publication history
Publication years1999-2014
Publication count21
Citation Count355
Available for download14
Downloads (6 Weeks)74
Downloads (12 Months)886
Downloads (cumulative)11,099
Average downloads per article792.79
Average citations per article16.90
View colleagues of Dennis Abts

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Amphenol. http://www.amphenol.com/.
 
2
L. N. Bhuyan and D. P. Agrawal. Generalized hypercube and hyperbus structures for a computer network. IEEE Trans. Computers, 33(4):323--333, 1984.
 
3
K.-Y. K. Chang et al. A 0.4-4-Gb/s CMOS Quad Transceiver Cell Using On-Chip Regulated Dual-Loop PLLs. IEEE Journal of Solid--State Circuits, 38(5):747--754, 2003.
 
4
C. Clos. A Study of Non-Blocking Switching Networks. The Bell System technical Journal, 32(2):406--424, March 1953.
 
5
Cray XT3. http://www.cray.com/products/systems/xt3/.
 
6
 
7
 
8
W. J. Dally, P. P. Carvey, and L. R. Dennison. The Avici Terabit Switch/Router. In Proc. of Hot Interconnects, pages 41--50, August 1998.
 
9
 
10
H. G. Dietz and T.I.Mattox. Compiler techniques for flat neighborhood networks. In 13th International Workshop on Languages and Compilers for Parallel Computing, pages 420--431, Yorktown Heights, New York, 2000.
 
11
Gore. http://www.gore.com/electronics.
12
13
14
 
15
C. P. Kruskal and M. Snir. The performance of multistage interconnection networks for multiprocessors. IEEE Trans. Computers, 32(12):1091--1098, 1983.
16
17
 
18
 
19
Microprocessor Report. http://www.mdronline.com/.
 
20
 
21
G. Pautsch. Thermal Challenges in the Next Generation of Supercomputers. CoolCon, 2005.
 
22
G. Pfister. An Introduction to the InfiniBand Arechitecture (http://www.infinibandta.org). IEEE Press, 2001.
 
23
 
24
S. Scott and G. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Hot Chips 4, Stanford, CA, Aug. 1996.
 
25
 
26
H. J. Siegel. A model of simd machines and a comparison of various interconnection networks. IEEE Trans. Computers, 28(12):907--917, 1979.
 
27
A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.
28
 
29
 
30
L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11(2):350--361, 1982.
 
31
 
32
K.-L. J. Wong, H. Hatamkhani, M. Mansuri, and C.-K. K. Yang. A 27-mW 3.6-Gb/s I/O Transceiver. IEEE Journal of
 
33
S. Young and S. Yalamanchili. Adaptive routing in generalized hypercube architectures. In Proc. of the IEEE Symposium on Parallel and Distributed Processing, pages 564--571, Dallas, TX, Dec. 1991.

top of pageCITED BY

63 Citations

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

· Proceeding
Title ISCA '07 Proceedings of the 34th annual international symposium on Computer architecture table of contents
General Chairs Dean Tullsen University of California, San Diego
Program Chairs Brad Calder Microsoft & University of California, San Diego
Pages 126-137
Publication Date2007-06-09 (yyyy-mm-dd)
Sponsors SIGARCH ACM Special Interest Group on Computer Architecture
IEEE-CS Computer Society
PublisherACM New York, NY, USA ©2007
ISBN: 978-1-59593-706-3 Order Number: 417070 doi>10.1145/1250662.1250679
Conference ISCAInternational Symposium on Computer Architecture ISCA logo
Paper Acceptance Rate 46 of 204 submissions, 23%
Overall Acceptance Rate 533 of 2,983 submissions, 18%
Year Submitted Accepted Rate
ISCA '99 135 26 19%
ISCA '01 163 24 15%
ISCA '02 180 27 15%
ISCA '03 184 36 20%
ISCA '04 217 31 14%
ISCA '05 194 45 23%
ISCA '06 234 31 13%
ISCA '07 204 46 23%
ISCA '08 259 37 14%
ISCA '09 210 43 20%
ISCA '10 245 44 18%
ISCA '11 208 40 19%
ISCA '12 262 47 18%
ISCA '13 288 56 19%
Overall 2,983 533 18%
· Newsletter
Title ACM SIGARCH Computer Architecture News table of contents archive
Volume 35 Issue 2, May 2007
Pages 126-137
Publication Date2007-06-09 (yyyy-mm-dd)
Sponsor SIGARCH ACM Special Interest Group on Computer Architecture
PublisherACM New York, NY, USA
ISSN: 0163-5964 doi>10.1145/1273440.1250679

APPEARS IN
Hardware Design
Hardware Design
Performance
Performance

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the 34th annual international symposium on Computer architecture
Table of Contents
SESSION: Special purpose to warehouse computers
N. Jouppi
Anton, a special-purpose machine for molecular dynamics simulation
David E. Shaw, Martin M. Deneroff, Ron O. Dror, Jeffrey S. Kuskin, Richard H. Larson, John K. Salmon, Cliff Young, Brannon Batson, Kevin J. Bowers, Jack C. Chao, Michael P. Eastwood, Joseph Gagliardo, J. P. Grossman, C. Richard Ho, Douglas J. Ierardi, István Kolossváry, John L. Klepeis, Timothy Layman, Christine McLeavey, Mark A. Moraes, Rolf Mueller, Edward C. Priest, Yibing Shan, Jochen Spengler, Michael Theobald, Brian Towles, Stanley C. Wang
Pages: 1-12
doi>10.1145/1250662.1250664
Full text: PDFPDF

The ability to perform long, accurate molecular dynamics (MD) simulations involving proteins and other biological macro-molecules could in principle provide answers to some of the most important currently outstanding questions in the fields of biology, ...
expand
Power provisioning for a warehouse-sized computer
Xiaobo Fan, Wolf-Dietrich Weber, Luiz Andre Barroso
Pages: 13-23
doi>10.1145/1250662.1250665
Full text: PDFPDF

Large-scale Internet services require a computing infrastructure that can beappropriately described as a warehouse-sized computing system. The cost ofbuilding datacenter facilities capable of delivering a given power capacity tosuch a computer can rival ...
expand
SESSION: Transactions and synchronization
K. Asanovic
Making the fast case common and the uncommon case simple in unbounded transactional memory
Colin Blundell, Joe Devietti, E. Christopher Lewis, Milo M. K. Martin
Pages: 24-34
doi>10.1145/1250662.1250667
Full text: PDFPDF

Hardware transactional memory has great potential to simplify the creation ofcorrect and efficient multithreaded programs, allowing programmers to exploitmore effectively the soon-to-be-ubiquitous multi-core designs. Several recentproposals have extended ...
expand
Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures
Weirong Zhu, Vugranam C Sreedhar, Ziang Hu, Guang R. Gao
Pages: 35-45
doi>10.1145/1250662.1250668
Full text: PDFPDF

Efficient fine-grain synchronization is extremely important to effectively harness the computational power of many-core architectures. However, designing and implementing finegrain synchronization in such architectures presents several challenges, including ...
expand
SESSION: Virtual caches and hierarchies
M. Martonosi
Virtual hierarchies to support server consolidation
Michael R. Marty, Mark D. Hill
Pages: 46-56
doi>10.1145/1250662.1250670
Full text: PDFPDF

Server consolidation is becoming an increasingly popular technique to manage and utilize systems. This paper develops CMP memory systems for server consolidation where most sharing occurs within Virtual Machines (VMs). Our memory systems maximize ...
expand
Virtual private caches
Kyle J. Nesbit, James Laudon, James E. Smith
Pages: 57-68
doi>10.1145/1250662.1250671
Full text: PDFPDF

Virtual Private Machines (VPM) provide a framework for Quality of Service (QoS) in CMP-based computer systems. VPMs incorporate microarchitecture mechanisms that allow shares of hardware resources to be allocated to executing threads, thus providing ...
expand
SESSION: Transactions
M. Tremblay
An effective hybrid transactional memory system with strong isolation guarantees
Chi Cao Minh, Martin Trautmann, JaeWoong Chung, Austen McDonald, Nathan Bronson, Jared Casper, Christos Kozyrakis, Kunle Olukotun
Pages: 69-80
doi>10.1145/1250662.1250673
Full text: PDFPDF

We propose signature-accelerated transactional memory (SigTM), ahybrid TM system that reduces the overhead of software transactions. SigTM uses hardware signatures to track the read-set and write-set forpending transactions and perform conflict detection ...
expand
Performance pathologies in hardware transactional memory
Jayaram Bobba, Kevin E. Moore, Haris Volos, Luke Yen, Mark D. Hill, Michael M. Swift, David A. Wood
Pages: 81-91
doi>10.1145/1250662.1250674
Full text: PDFPDF

Hardware Transactional Memory (HTM) systems reflect choices from three key design dimensions: conflict detection, version management, and conflict resolution. Previously proposed HTMs represent three points in this design space: lazy conflict detection, ...
expand
MetaTM/TxLinux: transactional memory for an operating system
Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter, Owen S. Hofmann, Aditya Bhandari, Emmett Witchel
Pages: 92-103
doi>10.1145/1250662.1250675
Full text: PDFPDF

This paper quantifies the effect of architectural design decisions onthe performance of TxLinux. TxLinux is a Linux kernel modifiedto use transactions in place of locking primitives in several key subsystems.We run TxLinux on MetaTM, which is a new hardwaretransaction ...
expand
An integrated hardware-software approach to flexible transactional memory
Arrvindh Shriraman, Michael F. Spear, Hemayet Hossain, Virendra J. Marathe, Sandhya Dwarkadas, Michael L. Scott
Pages: 104-115
doi>10.1145/1250662.1250676
Full text: PDFPDF

There has been considerable recent interest in both hardware andsoftware transactional memory (TM). We present an intermediateapproach, in which hardware serves to accelerate a TM implementation controlled fundamentally by software. Specifically, we ...
expand
SESSION: Networks and routers
M. Taylor
Rotary router: an efficient architecture for CMP interconnection networks
Pablo Abad, Valentin Puente, José Angel Gregorio, Pablo Prieto
Pages: 116-125
doi>10.1145/1250662.1250678
Full text: PDFPDF

The trend towards increasing the number of processor cores and cache capacity in future Chip-Multiprocessors (CMPs), will require scalable packet-switched interconnection networks adapted to the restrictions imposed by the CMP environment. This paper ...
expand
Flattened butterfly: a cost-efficient topology for high-radix networks
John Kim, William J. Dally, Dennis Abts
Pages: 126-137
doi>10.1145/1250662.1250679
Full text: PDFPDF

Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. ...
expand
A novel dimensionally-decomposed router for on-chip communication in 3D architectures
Jongman Kim, Chrysostomos Nicopoulos, Dongkook Park, Reetuparna Das, Yuan Xie, Vijaykrishnan Narayanan, Mazin S. Yousif, Chita R. Das
Pages: 138-149
doi>10.1145/1250662.1250680
Full text: PDFPDF

Much like multi-storey buildings in densely packed metropolises, three-dimensional (3D) chip structures are envisioned as a viable solution to skyrocketing transistor densities and burgeoning die sizes in multi-core architectures. Partitioning a larger ...
expand
Express virtual channels: towards the ideal interconnection fabric
Amit Kumar, Li-Shiuan Peh, Partha Kundu, Niraj K. Jha
Pages: 150-161
doi>10.1145/1250662.1250681
Full text: PDFPDF

Due to wire delay scalability and bandwidth limitations inherent in shared buses and dedicated links, packet-switched on-chip interconnection networks are fast emerging as the pervasive communication fabric to connect different processing elements in ...
expand
SESSION: Atomic regions and fine-grained parallelism
M. Martin
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Sanjeev Kumar, Christopher J. Hughes, Anthony Nguyen
Pages: 162-173
doi>10.1145/1250662.1250683
Full text: PDFPDF

Chip multiprocessors (CMPs) are now commonplace, and the number of cores on a CMP is likely to grow steadily. However, in order to harness the additional compute resources of a CMP, applications must expose their thread-level parallelism to the hardware. ...
expand
Hardware atomicity for reliable software speculation
Naveen Neelakantam, Ravi Rajwar, Suresh Srinivas, Uma Srinivasan, Craig Zilles
Pages: 174-185
doi>10.1145/1250662.1250684
Full text: PDFPDF

Speculative compiler optimizations are effective in improving both single-thread performance and reducing power consumption, but their implementation introduces significant complexity, which can limit their adoption, limit their optimization scope, and ...
expand
SESSION: Core fusion and quantum
D. Burger
Core fusion: accommodating software diversity in chip multiprocessors
Engin Ipek, Meyrem Kirman, Nevin Kirman, Jose F. Martinez
Pages: 186-197
doi>10.1145/1250662.1250686
Full text: PDFPDF

This paper presents core fusion, a reconfigurable chip multiprocessor(CMP) architecture where groups of fundamentally independent cores can dynamically morph into a larger CPU, or they can be used as distinct processing elements, as needed at ...
expand
Tailoring quantum architectures to implementation style: a quantum computer for mobile and persistent qubits
Eric Chi, Stephen A. Lyon, Margaret Martonosi
Pages: 198-209
doi>10.1145/1250662.1250687
Full text: PDFPDF

In recent years, quantum computing (QC) research has moved from the realm of theoretical physics and mathematics into real implementations. With many different potential hardware implementations, quantum computer architecture is a rich field with an ...
expand
SESSION: Streams to physics processors
B. Dally
A 64-bit stream processor architecture for scientific applications
Xuejun Yang, Xiaobo Yan, Zuocheng Xing, Yu Deng, Jiang Jiang, Ying Zhang
Pages: 210-219
doi>10.1145/1250662.1250689
Full text: PDFPDF

Stream architecture is a novel microprocessor architecture with wide application potential. But as for whether it can be used efficiently in scientific computing, many issues await further study. This paper first gives the design and implementation of ...
expand
Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors
Christopher J. Hughes, Radek Grzeszczuk, Eftychios Sifakis, Daehyun Kim, Sanjeev Kumar, Andrew P. Selle, Jatin Chhugani, Matthew Holliman, Yen-Kuang Chen
Pages: 220-231
doi>10.1145/1250662.1250690
Full text: PDFPDF

We explore the emerging application area of physics-based simulation for computer animation and visual special effects. In particular, we examine its parallelization potential and characterize its behavior on a chip multiprocessor (CMP). Applications ...
expand
ParallAX: an architecture for real-time physics
Thomas Y. Yeh, Petros Faloutsos, Sanjay J. Patel, Glenn Reinman
Pages: 232-243
doi>10.1145/1250662.1250691
Full text: PDFPDF

Future interactive entertainment applications will featurethe physical simulation of thousands of interacting objectsusing explosions, breakable objects, and cloth effects. Whilethese applications require a tremendous amount of performanceto satisfy ...
expand
SESSION: Bricks, mortars, and microfluidics
T. Sherwood
Architectural implications of brick and mortar silicon manufacturing
Martha Mercaldi Kim, Mojtaba Mehrara, Mark Oskin, Todd Austin
Pages: 244-253
doi>10.1145/1250662.1250693
Full text: PDFPDF

We introduce a novel chip fabrication technique called "brick and mortar", in which chips are made from small, pre-fabricated ASIC bricks and bonded in a designer-specified arrangement to an inter-brick communication backbone chip. The goal of brick ...
expand
Aquacore: a programmable architecture for microfluidics
Ahmed M. Amin, Mithuna Thottethodi, T. N. Vijaykumar, Steven Wereley, Stephen C. Jacobson
Pages: 254-265
doi>10.1145/1250662.1250694
Full text: PDFPDF

Advances in microfluidic research has enabled lab-on-a-chip (LoC) technology to achieve miniaturization and integration of biological and chemical analyses to a single chip comprising channels, valves, mixers, heaters, separators, and sensors. These ...
expand
SESSION: Memory consistency
M. Hill
Mechanisms for store-wait-free multiprocessors
Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
Pages: 266-277
doi>10.1145/1250662.1250696
Full text: PDFPDF

Store misses cause significant delays in shared-memory multiprocessors because of limited store buffering and ordering constraints required for proper synchronization. Today, programmers must choose from a spectrum of memory consistency models that reduce ...
expand
BulkSC: bulk enforcement of sequential consistency
Luis Ceze, James Tuck, Pablo Montesinos, Josep Torrellas
Pages: 278-289
doi>10.1145/1250662.1250697
Full text: PDFPDF

While Sequential Consistency (SC) is the most intuitive memory consistency model and the one most programmers likely assume, current multiprocessors do not support it. Instead, they support more relaxed models that deliver high performance. SC implementations ...
expand
SESSION: Power and thermal
P. Ranganathan
Limiting the power consumption of main memory
Bruno Diniz, Dorgival Guedes, Wagner Meira, Jr., Ricardo Bianchini
Pages: 290-301
doi>10.1145/1250662.1250699
Full text: PDFPDF

The peak power consumption of hardware components affects their powersupply, packaging, and cooling requirements. When the peak power consumption is high, the hardware components or the systems that use them can become expensive and bulky. Given that ...
expand
Power model validation through thermal measurements
Francisco Javier Mesa-Martinez, Joseph Nayfach-Battilana, Jose Renau
Pages: 302-311
doi>10.1145/1250662.1250700
Full text: PDFPDF

Simulation environments are an indispensable tool in the design, prototyping, performance evaluation, and analysis of computer systems. Simulator must beable to faithfully reflect the behavior of the system being analyzed. To ensure the accuracy of the ...
expand
Thermal modeling and management of DRAM memory systems
Jiang Lin, Hongzhong Zheng, Zhichun Zhu, Howard David, Zhao Zhang
Pages: 312-322
doi>10.1145/1250662.1250701
Full text: PDFPDF

With increasing speed and power density, high-performance memories, including FB-DIMM (Fully Buffered DIMM) and DDR2 DRAM, now begin to require dynamic thermal management(DTM) as processors and hard drives did. The DTM of memories, nevertheless, ...
expand
SESSION: Clocks, scheduling, and stores
T. Austin
ReCycle:: pipeline adaptation to tolerate process variation
Abhishek Tiwari, Smruti R. Sarangi, Josep Torrellas
Pages: 323-334
doi>10.1145/1250662.1250703
Full text: PDFPDF

Process variation affects processor pipelines by making some stages slower and others faster, therefore exacerbating pipeline unbalance. This reduces the frequency attainable by the pipeline. To improve performance, this paper proposes ReCycle, ...
expand
Matrix scheduler reloaded
Peter G. Sassone, Jeff Rupley, II, Edward Brekelbaum, Gabriel H. Loh, Bryan Black
Pages: 335-346
doi>10.1145/1250662.1250704
Full text: PDFPDF

From multiprocessor scale-up to cache sizes to the number of reorder-buffer entries, microarchitects wish to reap the benefits of more computing resources while staying within power and latency bounds. This tension is quite evident in schedulers, which ...
expand
Late-binding: enabling unordered load-store queues
Simha Sethumadhavan, Franziska Roesner, Joel S. Emer, Doug Burger, Stephen W. Keckler
Pages: 347-357
doi>10.1145/1250662.1250705
Full text: PDFPDF

Conventional load/store queues (LSQs) are an impediment to both power-efficient execution in superscalar processors and scaling tolarge-window designs. In this paper, we propose techniques to improve the area and power efficiency of LSQs by allocating ...
expand
SESSION: Memory and caches
L. Barroso
Comparing memory systems for chip multiprocessors
Jacob Leverich, Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis
Pages: 358-368
doi>10.1145/1250662.1250707
Full text: PDFPDF

There are two basic models for the on-chip memory in CMP systems:hardware-managed coherent caches and software-managed streaming memory. This paper performs a direct comparison of the two modelsunder the same set of assumptions about technology, ...
expand
Interconnect design considerations for large NUCA caches
Naveen Muralimanohar, Rajeev Balasubramonian
Pages: 369-380
doi>10.1145/1250662.1250708
Full text: PDFPDF

The ever increasing sizes of on-chip caches and the growing domination of wire delay necessitate significant changes to cache hierarchy design methodologies. Many recent proposals advocate splitting the cache into a large number of banks and employing ...
expand
Adaptive insertion policies for high performance caching
Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, Joel Emer
Pages: 381-391
doi>10.1145/1250662.1250709
Full text: PDFPDF

The commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applications, the majority of lines traverse from the MRU position to the LRU ...
expand
SESSION: Experience and methodology
J. Emer
Performance and security lessons learned from virtualizing the alpha processor
Paul A. Karger
Pages: 392-401
doi>10.1145/1250662.1250711
Full text: PDFPDF

Virtualization has become much more important throughout the computer industry both to improve security and to support multiple workloads on the same hardware with effective isolation between those workloads. The most widely used chip architecture, the ...
expand
Automated design of application specific superscalar processors: an analytical approach
Tejas S. Karkhanis, James E. Smith
Pages: 402-411
doi>10.1145/1250662.1250712
Full text: PDFPDF

Analytical modeling is applied to the automated design of application-specific superscalar processors. Using an analytical method bridges the gap between the size of the design space and the time required for detailed cycle-accurate simulations. The ...
expand
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
Aashish Phansalkar, Ajay Joshi, Lizy K. John
Pages: 412-423
doi>10.1145/1250662.1250713
Full text: PDFPDF

The recently released SPEC CPU2006 benchmark suite is expected to be used by computer designers and computer architecture researchers for pre-silicon early design analysis. Partial use of benchmark suites by researchers, due to simulation time constraints, ...
expand
SESSION: Control independence and prediction
C. Zilles
VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization
Hyesoon Kim, José A. Joao, Onur Mutlu, Chang Joo Lee, Yale N. Patt, Robert Cohn
Pages: 424-435
doi>10.1145/1250662.1250715
Full text: PDFPDF

Indirect branches have become increasingly common in modular programs written in modern object-oriented languages and virtual machine based runtime systems. Unfortunately, the prediction accuracy of indirect branches has not improved as much as that ...
expand
Ginger: control independence using tag rewriting
Andrew D. Hilton, Amir Roth
Pages: 436-447
doi>10.1145/1250662.1250716
Full text: PDFPDF

The negative performance impact of branch mis-predictions can be reduced by exploiting control independence (CI). When a branch mis-predicts, the wrong-path instructions up to the point where control converges with the correct path are selectively squashed ...
expand
Transparent control independence (TCI)
Ahmed S. Al-Zawawi, Vimal K. Reddy, Eric Rotenberg, Haitham H. Akkary
Pages: 448-459
doi>10.1145/1250662.1250717
Full text: PDFPDF

Superscalar architectures have been proposed that exploit control independence, reducing the performance penalty of branch mispredictions by preserving the work of future misprediction-independent instructions. The essential goal of exploiting control ...
expand
SESSION: Faults
J. Torrellas
Examining ACE analysis reliability estimates using fault-injection
Nicholas J. Wang, Aqeel Mahesri, Sanjay J. Patel
Pages: 460-469
doi>10.1145/1250662.1250719
Full text: PDFPDF

ACE analysis is a technique to provide an early reliability estimate for microprocessors. ACE analysis couples data from abstract performance models with low level design details to identify and rule out transient faults that will not cause incorrect ...
expand
Configurable isolation: building high availability systems with commodity multi-core processors
Nidhi Aggarwal, Parthasarathy Ranganathan, Norman P. Jouppi, James E. Smith
Pages: 470-481
doi>10.1145/1250662.1250720
Full text: PDFPDF

High availability is an increasingly important requirement for enterprise systems, often valued more than performance. Systems designed for high availability typically use redundant hardware for error detection and continued uptime in the event of a ...
expand
SESSION: Security
G. Reinman
Raksha: a flexible information flow architecture for software security
Michael Dalton, Hari Kannan, Christos Kozyrakis
Pages: 482-493
doi>10.1145/1250662.1250722
Full text: PDFPDF

High-level semantic vulnerabilities such as SQL injection and crosssite scripting have surpassed buffer overflows as the most prevalent security exploits. The breadth and diversity of software vulnerabilities demand new security solutions that combine ...
expand
New cache designs for thwarting software cache-based side channel attacks
Zhenghong Wang, Ruby B. Lee
Pages: 494-505
doi>10.1145/1250662.1250723
Full text: PDFPDF

Software cache-based side channel attacks are a serious new class of threats for computers. Unlike physical side channel attacks that mostly target embedded cryptographic devices, cache-based side channel attacks can also undermine general purpose systems. ...
expand
SESSION: Vulnerabilities
S. Adve
Mechanisms for bounding vulnerabilities of processor structures
Niranjan Kumar Soundararajan, Angshuman Parashar, Anand Sivasubramaniam
Pages: 506-515
doi>10.1145/1250662.1250725
Full text: PDFPDF

Concern for the increasing susceptibility of processor structures to transient errors has led to several recent research efforts that propose architectural techniques to enhance reliability. However, real systems are typically required to satisfy hard ...
expand
Dynamic prediction of architectural vulnerability from microarchitectural state
Kristen R. Walcott, Greg Humphreys, Sudhanva Gurumurthi
Pages: 516-527
doi>10.1145/1250662.1250726
Full text: PDFPDF

Transient faults due to particle strikes are a key challenge in microprocessor design. Driven by exponentially increasing transistor counts, per-chip faults are a growing burden. To protect against soft errors, redundancy techniques such as redundant ...
expand

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us

Useful downloads: Adobe Reader    QuickTime    Windows Media Player    Real Player
Did you know the ACM DL App is now available?
Did you know your Organization can subscribe to the ACM Digital Library?
The ACM Guide to Computing Literature
All Tags
Export Formats
 
 
Save to Binder