|
|
General Chair's Welcome |
|
Page: .03 |
|
|
|
|
Executive Committee |
|
Page: .04 |
|
|
|
|
Technical Program Committee |
|
Page: .06 |
|
|
|
|
1998 Best Paper Award |
|
Page: .09 |
|
|
|
|
ACSEE Undergraduate Scholarships |
|
Page: .09 |
|
|
|
|
1998 IEEE Fellows |
|
Page: .10 |
|
|
|
|
SIGDA Meritorious Service Award |
|
Page: .10 |
|
|
|
|
Design Automation Conference Graduate Scholarship Awards |
|
Page: .10 |
|
|
|
|
36th Call for Papers |
|
Page: .11 |
|
|
|
|
Reviewers |
|
Page: .12 |
|
|
|
|
Opening Keynote Address-William J. Spencer |
|
Page: .18 |
|
|
|
|
Thursday Keynote Address-George H. Heilmeier |
|
Page: .19 |
|
|
|
|
Customers, vendors, and universities (panel): determining the future of EDA together |
| |
Thomas Pennino
|
|
Page: 1 |
|
doi>10.1145/277044.277045 |
|
|
|
|
Asynchronous interface specification, analysis and synthesis |
| |
Michael Kishinevsky,
Jordi Cortadella,
Alex Kondratyev
|
|
Pages: 2-7 |
|
doi>10.1145/277044.277046 |
|
Full text: PDF
|
|
Interfaces, by nature, are often asynchronous since they serve for connecting multiple distributed mod ules/agents without common clock. However, the most recent developments in the the ory of asynchronous design in the areas of specifications, ...
Interfaces, by nature, are often asynchronous since they serve for connecting multiple distributed mod ules/agents without common clock. However, the most recent developments in the the ory of asynchronous design in the areas of specifications, mo dels, analysis, verification, synthesis, technology mapping, timing optimization and performanc e analysis are not widely known and r arely accepted by industry.
The go al of this tutorial is to fill this gap and to present an overview of one p opular systematic design methodology for design of asynchronous interface controllers. This metho dology is based on using P etri nets (PN) a formal mo del that, from the engine ering standpoint, is a formalization of timing diagrams (waveforms) and from the system designer standpoint is a concurrent state machine, in which lo cal comp onents can perform indep endent or interdependent concurrent actions, changing their local states asynchronously. We will introduce this mo del informally b ased on a simple example: a VME-bus controller serving reads from a devic e to a bus and writes f rom the bus into the devic e.
expand
|
|
|
Automatic synthesis of interfaces between incompatible protocols |
| |
Roberto Passerone,
James A. Rowson,
Alberto Sangiovanni-Vincentelli
|
|
Pages: 8-13 |
|
doi>10.1145/277044.277047 |
|
Full text: PDF
|
|
A t the system level, reusable Intellectual Property (or IP) blo cks can be represented abstractly as blocks that exchange messages. The concrete implementations of these IP blocks m ust exc hange the messages through complex signaling protocols. Interfacing ...
A t the system level, reusable Intellectual Property (or IP) blo cks can be represented abstractly as blocks that exchange messages. The concrete implementations of these IP blocks m ust exc hange the messages through complex signaling protocols. Interfacing bet ween IP that use different signaling protocols is a tedious and error prone design task. We propose using regular expression based protocol descriptions to sho w ho w to map the message on to a signaling protocol. Given t w o protocols,an algorithm is proposed to build an interface machine. We ha ve implemented our algorithm in a program named PIG that synthesizes a Verilog implementation based on a regular expression protocol description. expand
|
|
|
Automated composition of hardware components |
| |
James Smith,
Giovanni De Micheli
|
|
Pages: 14-19 |
|
doi>10.1145/277044.277048 |
|
Full text: PDF
|
|
In order to automate design reuse, methods for composing system components must be developed. The goal of this research is to automate the process of generating interfaces between hardware subsystems. The algorithms presented here can be used ...
In order to automate design reuse, methods for composing system components must be developed. The goal of this research is to automate the process of generating interfaces between hardware subsystems. The algorithms presented here can be used to generate a cycle-accurate, synchronous interface between two hardware subsystems given an HDL model of each subsystem. These algorithms have been implemented in the POLARIS hardware composition tool and have been used to generate an interface between a MIPS microprocessor and the SRAM that comprises its secondary cache. Interface generation for the MIPS R4000 is described.
expand
|
|
|
Multilevel integral equation methods for the extraction of substrate coupling parameters in mixed-signal IC's |
| |
Mike Chou,
Jacob White
|
|
Pages: 20-25 |
|
doi>10.1145/277044.277049 |
|
Full text: PDF
|
|
The extraction of substrate coupling resistances can be formulated as a first-kind integral equation, which requires only discretization of the two-dimensional contacts. However, the result is a dense matrix problem which is too expensive to store or ...
The extraction of substrate coupling resistances can be formulated as a first-kind integral equation, which requires only discretization of the two-dimensional contacts. However, the result is a dense matrix problem which is too expensive to store or to factor directly. Instead, we present a novel, multigrid iterative method which converges more rapidly than previously applied Krylov-subspace methods. At each level in the multigrid hierarchy, we avoid dense matrix-vector multiplication by using moment-matching approximations and a sparsification algorithm based on eigendecomposition. Results on realistic examples demonstrate that the combined approach is up to an order of magnitude faster than a Krylov-subspace method with sparsification, and orders of magnitude faster than not using sparsification at all.
expand
|
|
|
Phase noise in oscillators: a unifying theory and numerical methods for characterisation |
| |
Alper Demir,
Amit Mehrotra,
Jaijeet Roychowdhury
|
|
Pages: 26-31 |
|
doi>10.1145/277044.277050 |
|
Full text: PDF
|
|
Phase noise is a topic of theoretical and practical interest in electronic circuits, as well as in other fields such as optics. Although progress has been made in understanding the phenomenon, there still remain significant gaps, both in its fundamental ...
Phase noise is a topic of theoretical and practical interest in electronic circuits, as well as in other fields such as optics. Although progress has been made in understanding the phenomenon, there still remain significant gaps, both in its fundamental theory and in numerical techniques for its characterisation. In this paper, we develop a solid foundation for phase noise that is valid for any oscillator, regardless of operating mechanism. We establish novel results about the dynamics of stable nonlinear oscillators in the presence of perturbations, both deterministic and random. We obtain an exact, nonlinear equation for phase error, which we solve without approximations for random perturbations. This leads us to a precise characterisation of timing jitter and spectral dispersion, for computing which we develop efficient numerical methods. We demonstrate our techniques on practical electrical oscillators, and obtain good matches with measurements even at frequencies close to the carrier, where previous techniques break down.
expand
|
|
|
Efficient analog test methodology based on adaptive algorithms |
| |
Luigi Carro,
Marcelo Negreiros
|
|
Pages: 32-37 |
|
doi>10.1145/277044.277051 |
|
Full text: PDF
|
|
This papers describes a new, fast and economical methodology totest linear analog circuits based on adaptive algorithms. To theauthors knowledge, this is the first time such technique is used totest analog circuits, allowing complete fault coverage. ...
This papers describes a new, fast and economical methodology totest linear analog circuits based on adaptive algorithms. To theauthors knowledge, this is the first time such technique is used totest analog circuits, allowing complete fault coverage. The paperpresents experimental results showing easy detection of soft,large-deviation and hard faults, with low cost instrumentation.Components variations from 5% to 1% have been detected, asthe comparison parameter (output error power) varied from300% to 20%. expand
|
|
|
General AC constraint transformation for analog ICs |
| |
B. G. Arsintescu,
E. Charbon,
E. Malavasi,
U. Choudhury,
W. H. Kao
|
|
Pages: 38-43 |
|
doi>10.1145/277044.277052 |
|
Full text: PDF
|
|
The problem of designing complex analog circuits is attacke dusing a hier archic altop-down, constr aint-driven design methodolo gy. In this methodolo gy, constraints are prop agate dautomatically from high-level specific ationsto physic aldesign ...
The problem of designing complex analog circuits is attacke dusing a hier archic altop-down, constr aint-driven design methodolo gy. In this methodolo gy, constraints are prop agate dautomatically from high-level specific ationsto physic aldesign through a sequence of gradual transformations. Constraint tr ansformation is a critic al step in the methodolo gy, sinc e it determines in lar ge p art the degree to which specific ations are met. In this pap er we describ e how constr ainttransformations can be efficiently carrie d out using hier ar chic al parameter modeling and constr aine d optimization techniques. The process supp orts c omplexhigh-level sp ecific ation handling and accounts for second-order effects, such as inter connect parasitics and mismatches. The suitability of the appr oachis demonstrate d through an 4th order active filter test case.
expand
|
|
|
Design methodology used in a single-chip CMOS 900 MHz spread-spectrum wireless transceiver |
| |
Jacob Rael,
Ahmadreza Rofougaran,
Asad Abidi
|
|
Pages: 44-49 |
|
doi>10.1145/277044.277053 |
|
Full text: PDF
|
|
This paper describes the simulation and layout techniques used and developed in the design of a single-chip CMOS 900 MHz spread-spectrum wireless transceiver:
This paper describes the simulation and layout techniques used and developed in the design of a single-chip CMOS 900 MHz spread-spectrum wireless transceiver:
expand
|
|
|
A video signal processor for MIMD multiprocessing |
| |
Jörg Hilgenstock,
Klaus Herrmann,
Jan Otterstedt,
Dirk Niggemeyer,
Peter Pirsch
|
|
Pages: 50-55 |
|
doi>10.1145/277044.277054 |
|
Full text: PDF
|
|
The video signal processor AxPe1280V has been developed forimplementation of different video coding applications accordingto standards like ITU-T H.261/H.263, and ISO MPEG-1/2.Itconsists of a RISC processor supplemented by a coprocessor forconvolution-like ...
The video signal processor AxPe1280V has been developed forimplementation of different video coding applications accordingto standards like ITU-T H.261/H.263, and ISO MPEG-1/2.Itconsists of a RISC processor supplemented by a coprocessor forconvolution-like low-level tasks.RISC and coprocessor have beenimplemented in a standard cell design combined with full-custommodules.The processor was fabricated in a 0.5 ¿m CMOS technologyand has a die size of 82 mm{2}.It provides a peak performanceof more than 1 giga arithmetic operations per second (GOPS) at66 MHz.For processing of very computation-intensive algorithmsor high data rates, several processors can be bus-connected to forma MIMD multiprocessor system. expand
|
|
|
Realization of a programmable parallel DSP for high performance image processing applications |
| |
Jens Peter Wittenburg,
Willm Hinrichs,
Johannes Kneip,
Martin Ohmacht,
Mladen Bereković,
Hanno Lieske,
Helge Kloos,
Peter Pirsch
|
|
Pages: 56-61 |
|
doi>10.1145/277044.277055 |
|
Full text: PDF
|
|
Architecture and design of the HiPAR-DSP, a SIMD controlled signalprocessor with parallel data paths, VLIW and novel memory design.The processor architecture is derived from an analysis of thetarget algorithms and specified in VHDL on register transfer ...
Architecture and design of the HiPAR-DSP, a SIMD controlled signalprocessor with parallel data paths, VLIW and novel memory design.The processor architecture is derived from an analysis of thetarget algorithms and specified in VHDL on register transfer level.A team of more than 20 graduate students covered the whole designprocess, including the synthesizable VHDL description, synthesis,routing and backannotation as the development of a complete softwaredevelopment environment.The 175mm{2}, 0.5¿m 3LM CMOSdesign with 1.2 million transistors operates at 80 MHz and achievesa sustained performance of more than 600 million arithmetic operations. expand
|
|
|
A multiprocessor DSP system using PADDI-2 |
| |
Roy A. Sutton,
Vason P. Srini,
Jan M. Rabaey
|
|
Pages: 62-65 |
|
doi>10.1145/277044.277056 |
|
Full text: PDF
|
|
We have integrated an image processing system built around PADDI-2, a custom 48 node MIMD parallel DSP. The system includes image processing algorithms, a graphical SFG tool, a simulator, routing tools, compilers, hardware configuration and debugging ...
We have integrated an image processing system built around PADDI-2, a custom 48 node MIMD parallel DSP. The system includes image processing algorithms, a graphical SFG tool, a simulator, routing tools, compilers, hardware configuration and debugging tools, application development libraries, and software implementations for hardware verification. The system board,connected to a SPARCstation via a custom Sbus controller, contains 384 processors in 8 VLSI chips. The software environment supports a multiprocessor system under development (VGI-1). The software tools and libraries are modular, with implementation dependencies isolated in layered encapsulations.
expand
|
|
|
Design and implementation of the NUMAchine multiprocessor |
| |
A. Grbic,
S. Brown,
S. Caranci,
R. Grindley,
M. Gusat,
G. Lemieux,
K. Loveless,
N. Manjikian,
S. Srbljic,
M. Stumm,
Z. Vranesic,
Z. Zilic
|
|
Pages: 66-69 |
|
doi>10.1145/277044.277057 |
|
Full text: PDF
|
|
This paper describes the design and implementation of the NUMAchine multiprocessor. As the market for CC-NUMA multiprocessors expands, this research project provides a timely architectural design and cost-effective prototype. The key ...
This paper describes the design and implementation of the NUMAchine multiprocessor. As the market for CC-NUMA multiprocessors expands, this research project provides a timely architectural design and cost-effective prototype. The key to the successful implementation of our 48-processor prototype is the use of off-the-shelf components and programmable logic devices. Since this machine will serve as a research vehicle for parallel software development, a number of hardware features to enhance experimentation have been included in the design.
expand
|
|
|
Design and specification of embedded systems in Java using successive, formal refinement |
| |
James Shin Young,
Josh MacDonald,
Michael Shilman,
Abdallah Tabbara,
Paul Hilfinger,
A. Richard Newton
|
|
Pages: 70-75 |
|
doi>10.1145/277044.277058 |
|
Full text: PDF
|
|
Successive, formal refinement is a new approach for specificationof embedded systems using a general-purpose programming language.Systems are formally modeled as Abstractable SynchronousReactive systems, and Java is used as the design inputlanguage. ...
Successive, formal refinement is a new approach for specificationof embedded systems using a general-purpose programming language.Systems are formally modeled as Abstractable SynchronousReactive systems, and Java is used as the design inputlanguage. A policy of use is applied to Java, in the form of languageusage restrictions and class-library extensions, to ensureconsistency with the formal model. A process of incremental,user-guided program transformation is used to refine a Java programuntil it is consistent with the policy of use. The final productis a system specification possessing the properties of the formalmodel, including deterministic behavior, bounded memory usage,and bounded execution time. This approach allows systems designto begin with the flexibility of a general-purpose language, followedby gradual refinement into a more restricted form necessaryfor specification. expand
|
|
|
Efficient system exploration and synthesis of applications with dynamic data storage and intensive data transfer |
| |
Julio Leao da Silva, Jr.,
Chantal Ykman-Couvreur,
Miguel Miranda,
Kris Croes,
Sven Wuytack,
Gjalt de Jong,
Francky Catthoor,
Diederik Verkest,
Paul Six,
Hugo De Man
|
|
Pages: 76-81 |
|
doi>10.1145/277044.277059 |
|
Full text: PDF
|
|
Matisse is a design flow intended for developing embedded systems characterize dby tight inter action b etwe encontrol and data-flow behavior, intensive data storage and tr ansfer, dynamic creation of data, and stringent real-time requirements. ...
Matisse is a design flow intended for developing embedded systems characterize dby tight inter action b etwe encontrol and data-flow behavior, intensive data storage and tr ansfer, dynamic creation of data, and stringent real-time requirements. Matisse bridges the gap from a system specification, using a cocurr ent obje ct-oriented language, to an optimize d embedded single-chip HW/SW implementation. Matisse supp orts stepwise system-level exploration and refinement, memory architecture exploration, and gradualincorporation of timing constr aints b efore going to tr aditional tools for HW synthesis, SW compilation, and HW/SW interprocessor communication synthesis. Application of Matisse on telecom protocol processing systems shows significant improvements in area usage and power c onsumption.
expand
|
|
|
Design space exploration algorithm for heterogeneous multi-processor embedded system design |
| |
Ireneusz Karkowski,
Henk Corporaal
|
|
Pages: 82-87 |
|
doi>10.1145/277044.277060 |
|
Full text: PDF
|
|
Single-chip multi-processor embedded system becomesnowadays a feasible and very interesting option. What isneeded however is an environment that supports the designerin transforming an algorithmic specification into a suitableparallel implementation. ...
Single-chip multi-processor embedded system becomesnowadays a feasible and very interesting option. What isneeded however is an environment that supports the designerin transforming an algorithmic specification into a suitableparallel implementation. In this paper we present anddemonstrate an important component of such an environment - an efficient design space exploration algorithm. The algorithm can be used to semi-automatically find the bestparallelization of a given embedded application. It employsfunctional pipelining [13] and data set partitioning [16]simultaneously with source-to-source program transformationsto obtain the most advantageous hierarchical parallelizations. expand
|
|
|
Modal processes: towards enhanced retargetability through control composition of distributed embedded systems |
| |
Pai Chou,
Gaetano Borriello
|
|
Pages: 88-93 |
|
doi>10.1145/277044.277061 |
|
Full text: PDF
|
|
To explore different points in the design space of an embeddedsystem, it is important to be able to compose a designfrom reusable design components, and then map the resultingsystem description onto several possible target architectureswith different ...
To explore different points in the design space of an embeddedsystem, it is important to be able to compose a designfrom reusable design components, and then map the resultingsystem description onto several possible target architectureswith different partitionings of functionality. Today's specificationmodels support composition styles that work well fordata communication but not for control communication betweenconcurrent processes to be mapped onto a distributedarchitecture. We propose a new retargetable system specificationmodel that combines the best properties of process-basedand hierarchical-FSM-based methods for modular compositionof data and control. The model lends itself to automatedsynthesis of the run-time system for coordinating tasks ondifferent processors in the system. The model and synthesismethod are illustrated with several examples of embeddedsystems. expand
|
|
|
Design methodologies for noise in digital integrated circuits |
| |
Kenneth L. Shepard
|
|
Pages: 94-99 |
|
doi>10.1145/277044.277062 |
|
Full text: PDF
|
|
In this paper, we describe the growing problems of noise in digital integrated circuits and the design tools and techniques used to ensure the noise immunity of digital designs.
In this paper, we describe the growing problems of noise in digital integrated circuits and the design tools and techniques used to ensure the noise immunity of digital designs.
expand
|
|
|
Taming noise in deep submicron digital integrated circuits (panel) |
| |
Kenneth Shepard,
Takahide Inone /
Nagaraj NS
|
|
Pages: 100-101 |
|
doi>10.1145/277044.277064 |
|
Full text: PDF
|
|
As technology scales into the deep submicron regime, noise immunity is becoming a metric of comparable importance to area, timing, and power for the analysis and design of digital VLSI chips. Are functional failures due to noise really a problem in a ...
As technology scales into the deep submicron regime, noise immunity is becoming a metric of comparable importance to area, timing, and power for the analysis and design of digital VLSI chips. Are functional failures due to noise really a problem in a static CMOS design? Are design rules in the circuits and interconnect sufficient to protect against noise failures? Do design rules targetted to ensure noise immunity result in excessive penalty to performance and area due to their inherent conservatism? Is inductance in the interconnect really a problem? How much do we really need to account for capacitive coupling inductance, and inductive coupling in delay analysis?
expand
|
|
|
FACT: a framework for the application of throughput and power optimizing transformations to control-flow intensive behavioral descriptions |
| |
Ganesh Lakshminarayana,
Niraj K. Jha
|
|
Pages: 102-107 |
|
doi>10.1145/277044.277066 |
|
Full text: PDF
|
|
In this paper, we present an algorithm for the application of a general class of transformations to control-flow intensive behavioral descriptions. Our algorithm is based on the observation that incorporation of scheduling information can help guide ...
In this paper, we present an algorithm for the application of a general class of transformations to control-flow intensive behavioral descriptions. Our algorithm is based on the observation that incorporation of scheduling information can help guide the selection and application of candidate transformations, and significantly enhance the quality of the synthesized solution. The efficacy of the selected throughput and power optimizing transformations is enhanced by the ability of our algorithm to transcend basic blocks in the behavioral description. This ability is imparted to our algorithm by a general technique we have devised. Our system currently supports associativity, commutativity, distributivity, constant propagation, code motion, and loop unrolling. It is integrated with a scheduler which performs implicit loop unrolling and functional pipelining, and has the ability to parallelize the execution of independent iterative constructs whose bodies can share resources. Other transformations can easily be incorporated within the framework. We demonstrate the efficacy of our algorithm by applying it to several commonly available benchmarks. Upon synthesis, behaviors transformed by the application of our algorithm showed up to 6-fold improvement in throughput over an existing transformation algorithm, and up to 4.5-fold improvement in power over designs produced without the benefit of our algorithm.
expand
|
|
|
Incorporating speculative execution into scheduling of control-flow intensive behavioral descriptions |
| |
Ganesh Lakshminarayana,
Anand Raghunathan,
Niraj K. Jha
|
|
Pages: 108-113 |
|
doi>10.1145/277044.277067 |
|
Full text: PDF
|
|
Speculative execution refers to the execution of parts of a computation before the execution of the conditional operations that decide whether it needs to be executed. It has been shown to be a promising technique for eliminating performance bottlenecks ...
Speculative execution refers to the execution of parts of a computation before the execution of the conditional operations that decide whether it needs to be executed. It has been shown to be a promising technique for eliminating performance bottlenecks imposed by control flow in hardware and software implementations alike. In this paper, we present techniques to incorporate speculative execution in a fine-grained manner into scheduling of control-flow intensive behavioral descriptions. We demonstrate that failing to take into account information such as resource constraints and branch probabilities can lead to significantly sub-optimal performance. We also demonstrate that it may be necessary to speculate simultaneously along multiple paths, subject to resource constraints, in order to minimize the delay overheads incurred when prediction errors occur. Experimental results on several benchmarks show that our speculative scheduling algorithm can result in significant (upto seven-fold) improvements in performance (measured in terms of the average number of clock cycles) as compared to scheduling without speculative execution. Also, the best and worst case execution times for the speculatively performed schedules are the same as or better than the corresponding values for the schedules obtained without speculative execution.
expand
|
|
|
The DT-model: high-level synthesis using data transfers |
| |
Shantanu Tarafdar,
Miriam Leeser
|
|
Pages: 114-121 |
|
doi>10.1145/277044.277069 |
|
Full text: PDF
|
|
We presen t a new model for formulating the classic HLS sub-problems: scheduling, allocation, and binding. The model is unique in its use of data-transfers as the basic entity in syn thesis. A data transfer represents the movement of one instance of ...
We presen t a new model for formulating the classic HLS sub-problems: scheduling, allocation, and binding. The model is unique in its use of data-transfers as the basic entity in syn thesis. A data transfer represents the movement of one instance of data and con tains the operation sourcing the data and all the operations using it. Our model compels the storage architecture of the design to be optimized concurren tly with the execution unit. We ha ve built a high-level syn thesis system, Midas, based on our data transfer model. Midas generates designs with smaller storage and data transfer requirements than other HLS systems.
expand
|
|
|
Rate Optimal VLSI Design from Data Flow Graph |
|
Page: 118 |
|
This paper considers the rate optimal VLSI design of a recursivedata flow graph (DFG).Previous research on rateoptimal scheduling is not directly applicable to VLSI design.We propose a technique that inserts buffer registers toallow overlapped rate optimal ...
This paper considers the rate optimal VLSI design of a recursivedata flow graph (DFG).Previous research on rateoptimal scheduling is not directly applicable to VLSI design.We propose a technique that inserts buffer registers toallow overlapped rate optimal implementation of VLSI.Weillustrate that nonoverlapped schedules can be implementedby a simpler control path but with a larger unfolding factor,if exists, than overlapped schedules. expand
|
|
|
Planning for performance |
| |
Ralph H. J. M. Otten,
Robert K. Brayton
|
|
Pages: 122-127 |
|
doi>10.1145/277044.277071 |
|
Full text: PDF
|
|
A shift is proposed in the design of VLSI circuits. In conventional design, higher levels of synthesis produce a netlist, from which layout synthesis builds a mask specification for manufacturing. Timing analysis is built into a feedback loop to detect ...
A shift is proposed in the design of VLSI circuits. In conventional design, higher levels of synthesis produce a netlist, from which layout synthesis builds a mask specification for manufacturing. Timing analysis is built into a feedback loop to detect timing violations which are then used to update specifications to synthesis. Such iteration is undesirable, and for very high performance designs, infeasible. The problem is likely to become much worse with future generations of technology. To achieve a non-iterative design flow, we propose that early synthesis stages should use “wireplanning” to distribute delays over the functional elements and interconnect, and layout synthesis should use its degrees of freedom to realize those delays. In this paper we attempt to quantify this problem for future technologies and propose some solutions for a “constant delay” methodology.
expand
|
|
|
A DSM design flow: putting floorplanning, technology-mapping, and gate-placement together |
| |
Amir H. Salek,
Jinan Lou,
Massoud Pedram
|
|
Pages: 128-134 |
|
doi>10.1145/277044.277072 |
|
Full text: PDF
|
|
This paper presents an integrated design flowwhich combines floorplanning, technology mapping, andplacement using a dynamic programming algorithm. Theproposed design flow consists of five steps: maximum treesub-structure formation, levelized cluster ...
This paper presents an integrated design flowwhich combines floorplanning, technology mapping, andplacement using a dynamic programming algorithm. Theproposed design flow consists of five steps: maximum treesub-structure formation, levelized cluster tree construction,minimum area implementation using 2-D shape functions,critical path identification, and repeated application ofsimultaneous floorplanning, technology mapping and gateplacement along the timing critical paths. Experimentalresults obtained from an extensive set of benchmarksdemonstrate the effectiveness of the proposed flow. expand
|
|
|
Framework encapsulations: a new approach to CAD tool interoperability |
| |
Peter R. Sutton,
Stephen W. Director
|
|
Pages: 134-139 |
|
doi>10.1145/277044.277074 |
|
Full text: PDF
|
|
Today's complex leading-edge design processes require the use of multiple CAD tools that operate in multiple frameworks making management of the complete design process difficult. This paper introduces the concept of framework encapsulations: software ...
Today's complex leading-edge design processes require the use of multiple CAD tools that operate in multiple frameworks making management of the complete design process difficult. This paper introduces the concept of framework encapsulations: software wrappers around complete CAD frameworks that allow the design data and flow management services of a framework to be utilized by a common design process management tool. This concept has been applied to the Minerva II Design Process Manager, enabling Minerva II to manage the design process across multiple CAD frameworks, and potentially multiple design disciplines.
expand
|
|
|
A geographically distributed framework for embedded system design and validation |
| |
Ken Hines,
Gaetano Borriello
|
|
Pages: 140-145 |
|
doi>10.1145/277044.277075 |
|
Full text: PDF
|
|
The difficulty of emb edded system co-design is increasing rapidly due to the incr easing complexity of individual parts, the variety of parts available and pr essure to use multiple processors to me et performanc e criteria. V alidation tools ...
The difficulty of emb edded system co-design is increasing rapidly due to the incr easing complexity of individual parts, the variety of parts available and pr essure to use multiple processors to me et performanc e criteria. V alidation tools should contain sever al features in order to keep up with this trend, including the ability to dynamically change detail levels, built in protection for intellectual property, and supp ort for gradual migr ation of functionalityfrom a simulation environment to the real hardware. In this paper, we present our appr oach to the problem which includes a geographically distributed co-simulation framework. This fr amework is a system of nodes such that each can include either portions of the simulator or real hardware. In supp ort of this, the framework includes a me chanism for maintaining consistent versions of virtual time.
expand
|
|
|
WELD—an environment for Web-based electronic design |
| |
Francis L. Chan,
Mark D. Spiller,
A. Richard Newton
|
|
Pages: 146-151 |
|
doi>10.1145/277044.277077 |
|
Full text: PDF
|
|
Increasing size and geographical separation of design data and teams has created a need for a network-based electronic design environment that is scaleable, adaptable, secure, highly available, and cost effective. In the WELD project we are evaluating ...
Increasing size and geographical separation of design data and teams has created a need for a network-based electronic design environment that is scaleable, adaptable, secure, highly available, and cost effective. In the WELD project we are evaluating aspects of the network integration and communication infrastructure needed to enable such a distributed design environment. The architecture of WELD and the components developed to implement the system, together with performance results, are described and evaluated.
expand
|
|
|
OCCOM: efficient computation of observability-based code coverage metrics for functional verification |
| |
Farzan Fallah,
Srinivas Devadas,
Kurt Keutzer
|
|
Pages: 152-157 |
|
doi>10.1145/277044.277078 |
|
Full text: PDF
|
|
Functional simulation is still the primary workhorse for verifying the functional correctness of hardware designs. Functional verification is necessarily incomplete because it is not computationally feasible to exhaustively simulate designs. It is important ...
Functional simulation is still the primary workhorse for verifying the functional correctness of hardware designs. Functional verification is necessarily incomplete because it is not computationally feasible to exhaustively simulate designs. It is important therefore to quantitatively measure the degree of verification coverage of the design. Coverage metrics proposed for measuring the extent of design verification provided by a set of functional simulation vectors should compute statement execution counts (controllability information), and check to see whether effects of possible errors activated by program stimuli can be observed at the circuit outputs (observability information). Unfortunately, the metrics proposed thus far, either to not compute both types of information, or are inefficient, i.e., the overhead of computing the metric is very large.
In this paper, we provide the details of an efficient method to compute an Observability-based Code COverage Metric (OCCOM) that can be used while simulating complex HDL designs. This method offers a more accurate assessment of design verification coverage than line coverage, and is significantly more computationally efficient than prior efforts to assess observability information because it breaks up the computation into two phases: Functional simulation of a modified HDL model, followed by analysis of a flowgraph extracted from the HDL model. Commercial HDL simulators can be directly used for the time-consuming first phase, and the second phase can be performed efficiently using concurrent evaluation techniques.
expand
|
|
|
User defined coverage—a tool supported methodology for design verification |
| |
Raanan Grinwald,
Eran Harel,
Michael Orgad,
Shmuel Ur,
Avi Ziv
|
|
Pages: 158-163 |
|
doi>10.1145/277044.277081 |
|
Full text: PDF
|
|
This paper describes a new coverage methodology developed at IBM's Haifa Research Lab. The main idea behind the methodology is a separation of the coverage model definition from the coverage analysis tool. This enables the user to define the coverage ...
This paper describes a new coverage methodology developed at IBM's Haifa Research Lab. The main idea behind the methodology is a separation of the coverage model definition from the coverage analysis tool. This enables the user to define the coverage models that best fit the points of significance in the design, and still have the benefits of a coverage tool. To support this methodology, we developed a new coverage measurement tool called Comet. The tool is currently used in many domains, such as system verification and micro-architecture verification, and in many types of designs ranging from systems, to microprocessors, and ASICs.
expand
|
|
|
Enhanced visibility and performance in functional verification by reconstruction |
| |
Joshua Marantz
|
|
Pages: 164-169 |
|
doi>10.1145/277044.277083 |
|
Full text: PDF
|
|
Cycle simulators, in-circuit emulators, and hardware accelerators have made it possible to rapidly model the functionality of large digital designs. But these techniques provide limited visibility of internal design nodes, making debugging hard. Simulators ...
Cycle simulators, in-circuit emulators, and hardware accelerators have made it possible to rapidly model the functionality of large digital designs. But these techniques provide limited visibility of internal design nodes, making debugging hard. Simulators run slowly when all nodes are traced. Emulators provide full visibility only with limited depth, or with greatly reduced speed. This paper discusses software techniques for increasing design visibility while reducing tracing overhead in simulation, and achieving 100% visibility in emulation without reducing speed or compromising depth.
expand
|
|
|
Virtual chip: making functional models work on real target systems |
| |
Namseung Kim,
Hoon Choi,
Seungjong Lee,
Seungwang Lee,
In-Cheolo Park,
Chong-Min Kyung
|
|
Pages: 170-173 |
|
doi>10.1145/277044.277084 |
|
Full text: PDF
|
|
As design complexity increases, functional verification becomes a crucial issue to ensure design correctness at an early design stage. Traditional methods for verifying functional designs are based on the HDL simulation, which is becoming the bottleneck ...
As design complexity increases, functional verification becomes a crucial issue to ensure design correctness at an early design stage. Traditional methods for verifying functional designs are based on the HDL simulation, which is becoming the bottleneck of the design cycle because of the increasing design complexity. The accurate verification ability at the architectural level through a large set of the test programs and real world applications is a foundation for the next design step. In this paper, we describe how to verify a functional model on a real target system. The proposed methodology called virtual chip makes it possible not only to check the functional correctness on real systems, but also to explore design space by measuring the performance effectiveness of various architecture parameters under real applications. Experimental results show that functional models can be verified on real systems using complicated application programs. The proposed functional verification method is faster than HDL simulation and even comparable to emulation.
expand
|
|
|
Hardware/software co-design (panel): the next embedded system design challenge |
| |
Peter Heller
|
|
Pages: 174-175 |
|
doi>10.1145/277044.277086 |
|
Full text: PDF
|
|
With the proliferation of consumer electronics, the number of embedded systems is growing dramatically, according to Collett International Inc.'s research. At the same time, embedded systems are growing in size and complexity. Another major trend is ...
With the proliferation of consumer electronics, the number of embedded systems is growing dramatically, according to Collett International Inc.'s research. At the same time, embedded systems are growing in size and complexity. Another major trend is to increasingly implement functionality in software. Design teams are using software to differentiate products, increase flexibility, respond to changing standards, enable inexpensive upgradability, and get products to market sooner. This confluence of forces is causing design teams to confront a host of new challenges and to even re-evaluate their fundamental design practices. The challenges of designing hardware and software in concert are now coming to the forefront. Some of the topics to be discussed include: Short of producing synthesizable output, can hardware/software co-design provide meaningful value? Can automatic hardware and software partitioning tools replace engineering judgement? Who is responsible for hardware/software co-design? System architects? Hardware designers? Software designers? Can today's tools truly optimize system functionality and performance tradeoffs between hardware and software implementation? Are commercial real time-operating systems impacting hardware and software tradeoff decisions?
expand
|
|
|
Power optimization of variable voltage core-based systems |
| |
Inki Hong,
Darko Kirovski,
Gang Qu,
Miodrag Potkonjak,
Mani B. Srivastava
|
|
Pages: 176-181 |
|
doi>10.1145/277044.277088 |
|
Full text: PDF
|
|
The growing class of portable systems, such as personal computing and communication devices, has resulted in a new set of system design requirements, mainly characterized by dominant importance of power minimization and design reuse. We develop the design ...
The growing class of portable systems, such as personal computing and communication devices, has resulted in a new set of system design requirements, mainly characterized by dominant importance of power minimization and design reuse. We develop the design methodology for the low power core-based real-time system-on-chip based on dynamically variable voltage hardware. The key challenge is to develop effective scheduling techniques that treat voltage as a variable to be determined, in addition to the conventional task scheduling and allocation. Our synthesis technique also addresses the selection of the processor core and the determination of the instruction and data cache size and configuration so as to fully exploit dynamically variable voltage hardware, which result in significantly lower power consumption for a set of target applications than existing techniques. The highlight of the proposed approach is the non-preemptive scheduling heuristic which results in solutions very close to optimal ones for many test cases. The effectiveness of the approach is demonstrated on a variety of modern industrial-strength multimedia and communication applications.
expand
|
|
|
Policy optimization for dynamic power management |
| |
G. A. Paleologo,
L. Benini,
A. Bogliolo,
G. De Micheli
|
|
Pages: 182-187 |
|
doi>10.1145/277044.277094 |
|
Full text: PDF
|
|
Dynamic power management schemes (also called policies) can be used to control the power consumption levels of electronic systems, by setting their components in different states, each characterized by a performance level and a power consumption. ...
Dynamic power management schemes (also called policies) can be used to control the power consumption levels of electronic systems, by setting their components in different states, each characterized by a performance level and a power consumption. In this paper, we describe power-managed systems using a finite-state, stochastic model. Furthermore, we show that the fundamental problem of finding an optimal policy which maximizes the average performance level of a system, subject to a constraint on the power consumption, can be formulated as a stochastic optimization problem called policy optimization. Policy optimization can be solved exactly in polynomial time (in the number of states of the model). We implemented a policy optimization tool and tested the quality of the optimal policies on a realistic case study.
expand
|
|
|
A framework for estimation and minimizing energy dissipation of embedded HW/SW systems |
| |
Yanbing Li,
Jörg Henkel
|
|
Pages: 188-193 |
|
doi>10.1145/277044.277097 |
|
Full text: PDF
|
|
Embedded system design is one of the most challenging tasks in VLSI CAD because of the vast amount of system parameters to fix and the great variety of constraints to meet. In this paper we focus on the constraint of low energy dissipation, an indispensable ...
Embedded system design is one of the most challenging tasks in VLSI CAD because of the vast amount of system parameters to fix and the great variety of constraints to meet. In this paper we focus on the constraint of low energy dissipation, an indispensable peculiarity of embedded mobile computing systems. We present the first comprehensive framework that simultaneously evaluates the tradeoffs of energy dissipations of software and hardware such as caches and main memory. Unlike previous work in low power research which focused only on software or hardware, our framework optimizes system parameters to minimize energy dissipation of the overall system. The trade-off between system performance and energy dissipation is also explored. Experimental results show that our Avalanche framework can drastically reduce system energy dissipation.
expand
|
|
|
Using reconfigurable computing techniques to accelerate problems in the CAD domain: a case study with Boolean satisfiability |
| |
Peixin Zhong,
Pranav Ashar,
Sharad Malik,
Margaret Martonosi
|
|
Pages: 194-199 |
|
doi>10.1145/277044.277098 |
|
Full text: PDF
|
|
The Boolean satisfiability problem lies at the core of several CAD applications, including automatic test pattern generation and logic synthesis. This paper describes and evaluates an approach for accelerating Boolean satisfiability using configurable ...
The Boolean satisfiability problem lies at the core of several CAD applications, including automatic test pattern generation and logic synthesis. This paper describes and evaluates an approach for accelerating Boolean satisfiability using configurable hardware. Our approach harnesses the increasing speed and capacity of field-programmable gate arrays by tailoring the SAT-solver circuit to the particular formula being solved. This input-specific technique gets high performance due both to (i) a direct mapping of Boolean operations to logic gates, and (ii) large amounts of fine-grain parallelism in the implication processing. Overall, these strategies yields impressive speedups (>200X in many cases) compared to current software approaches, and they require only modest amounts of hardware. In a broader sense, this paper alerts the hardware design community to the increasing importance of input-specific designs, and documents their promise via a quantitative study of input-specific SAT solving.
expand
|
|
|
Fast exact minimization of BDDs |
| |
Rolf Drechsler,
Nicole Drechsler,
Wolfgang Günther
|
|
Pages: 200-205 |
|
doi>10.1145/277044.277099 |
|
Full text: PDF
|
|
We present a new exact algorithm for finding the optimal variable or deringfor r educ edordered Binary Decision Diagrams (BDDs). The algorithm makes use of a lower bound technique known from VLSI design.Up to now this technique has been use donly ...
We present a new exact algorithm for finding the optimal variable or deringfor r educ edordered Binary Decision Diagrams (BDDs). The algorithm makes use of a lower bound technique known from VLSI design.Up to now this technique has been use donly for the oreticalconsider ationsand it is adapte dhere for our purp ose. Furthermore, the algorithm supports symmetry aspects and makes use of a hashing b aseddata structure. Exp erimental esults are given to demonstrate the efficiency of our appr oach. We succeeded in minimizing adder functions with up to 64 variables, while all other pr eviously pr esente d appr oaches fail.
expand
|
|
|
Boolean matching for large libraries |
| |
Uwe Hinsberger,
Reiner Kolla
|
|
Pages: 206-211 |
|
doi>10.1145/277044.277100 |
|
Full text: PDF
|
|
Boolean matching tackles the problem whether a subcircuit of a boolean network can be substituted by a cell from a cell library. In previous approaches [7, 10, 8] each pair of a subcircuit and a cell is tested for NPN equivalence. This becomes very expensive ...
Boolean matching tackles the problem whether a subcircuit of a boolean network can be substituted by a cell from a cell library. In previous approaches [7, 10, 8] each pair of a subcircuit and a cell is tested for NPN equivalence. This becomes very expensive if the cell library is large. In our approach the time complexity for matching a subcircuit against a library L is almost independent of the size of L. CPU time also remains small for matching a subcircuit against the huge set of functions obtained by bridging and fixing cell inputs; but the use of these functions in technology mapping is very profitable. Our method is based on a canonical representative for each NPN equivalence class. We show how this representative can be computed efficiently and how it can be used for matching a boolcan function against a set of library functions.
expand
|
|
|
A fast hierarchical algorithm for 3-D capacitance extraction |
| |
Weiping Shi,
Jianguo Liu,
Naveen Kakani,
Tiejun Yu
|
|
Pages: 212-217 |
|
doi>10.1145/277044.277101 |
|
Full text: PDF
|
|
We presen t a new algorithm for computing the capacitance of three-dimensional perfect electrical conductors of complex structures. The new algorithm is significantly faster and uses muc h less memory than previous best algorithms, and is kernel independent.
The ...
We presen t a new algorithm for computing the capacitance of three-dimensional perfect electrical conductors of complex structures. The new algorithm is significantly faster and uses muc h less memory than previous best algorithms, and is kernel independent.
The new algorithm is based on a hierarchical algorithm for the n-body problem, and is an acceleration of the boundary-element method for solving the integral equation associated with the capacitance extraction problem. The algorithm first adaptively subdivides the conductor surfaces into panels according to an estimation of the potential coefficients and a user-supplied error band. The algorithm stores the poten tial coefficient matrix in a hierarchical data structure of size O(n), although the matrix is size n2 if expanded explicitly, wheren is the n umber of panels. The hierarchical data structure allows us to multiply the coefficient matrix with an y vector in O(n) time. Finally, w e use a generalized minimal residual algorithm to solve m linear systems each of size n × n in O(mn) time, where m is the n umber of conductors.
The new algorithm is implemented and the performance is compared with previous best algorithms. F or the k × k bus example, our algorithm is 100 to 40 times faster than F astCap, and uses 1/100 to 1/60 of the memory used by F astCap. The results computed by the new algorithm are within 2.7% from that computed by FastCap.
expand
|
|
|
Boundary element method macromodels for 2-D hierachical capacitance extraction |
| |
E. Aykut Dengi,
Ronald A. Rohrer
|
|
Pages: 218-223 |
|
doi>10.1145/277044.277102 |
|
Full text: PDF
|
|
We presen t a new algorithm for computing the capacitance of three-dimensional perfect electrical conductors of complex structures. The new algorithm is significantly faster and uses muc h less memory than previous best algorithms, and is kernel independent.
The ...
We presen t a new algorithm for computing the capacitance of three-dimensional perfect electrical conductors of complex structures. The new algorithm is significantly faster and uses muc h less memory than previous best algorithms, and is kernel independent.
The new algorithm is based on a hierarchical algorithm for the n-body problem, and is an acceleration of the boundary-element method for solving the integral equation associated with the capacitance extraction problem. The algorithm first adaptively subdivides the conductor surfaces into panels according to an estimation of the potential coefficients and a user-supplied error band. The algorithm stores the poten tial coefficient matrix in a hierarchical data structure of size O(n), although the matrix is size n2 if expanded explicitly, wheren is the n umber of panels. The hierarchical data structure allows us to multiply the coefficient matrix with an y vector in O(n) time. Finally, w e use a generalized minimal residual algorithm to solve m linear systems each of size n × n in O(mn) time, where m is the n umber of conductors.
The new algorithm is implemented and the performance is compared with previous best algorithms. F or the k × k bus example, our algorithm is 100 to 40 times faster than F astCap, and uses 1/100 to 1/60 of the memory used by F astCap. The results computed by the new algorithm are within 2.7% from that computed by FastCap.
expand
|
|
|
Efficient thee-dimensional extraction based on static and full-wave layered Green's functions |
| |
Jinsong Zhao,
Wayne W. M. Dai,
Sharad Kapur,
David E. Long
|
|
Pages: 224-229 |
|
doi>10.1145/277044.277103 |
|
Full text: PDF
|
|
In tegral equation approaches based on layered media Green's functions are often used to extract models of integrated circuit structures. The primary advan tage of these approaches over equiv alen t-sourcebased schemes is the dramatic reduction in problem ...
In tegral equation approaches based on layered media Green's functions are often used to extract models of integrated circuit structures. The primary advan tage of these approaches over equiv alen t-sourcebased schemes is the dramatic reduction in problem size. When combined with an SVD-accelerated sc heme for the solution of the associated dense linear system, this leads to a substantial speedup.
In this paper we deriv e and solve for these multila yered 3D Green's functions using a transmission line circuit analog. A generalized image method for an arbitrary number of layers is presen ted. This method is rapidly con vergen t for near-field in teractions. F or the far field, a Chebyshev interpolation approach is adopted, where a database is precomputed (using a Fast Hankel T ransform) and stored. The combination of these tw o approaches leads to an extremely efficient scheme for the generation of Green's functions.
We combine the SVD-accelerated integral equation solver IES3 with the multila yered Green's function approach, apply it to the extraction of IC parasitics and passive components, and we demonstrate its speed, accuracy and versatility via a number of examples.
expand
|
|
|
Robust Elmore delay models suitable for full chip timing verification of a 600MHz CMOS microprocessor |
| |
Nevine Nassif,
Madhav P. Desai,
Dale H. Hall
|
|
Pages: 230-235 |
|
doi>10.1145/277044.277104 |
|
Full text: PDF
|
|
In this paper we introduce a method for computing the Elmore delay of MOS circuits which relies on a model of the capacitance of MOS devices and a model of the Elmore delay of individual MOS devices. The resistance of a device is not explicitly modelled. ...
In this paper we introduce a method for computing the Elmore delay of MOS circuits which relies on a model of the capacitance of MOS devices and a model of the Elmore delay of individual MOS devices. The resistance of a device is not explicitly modelled. The Elmore models are used to compute the Elmore delay and the 50% point delay of CMOS circuits in a static timing verifier. Elmore delays computed with these models fall within 10% of SPICE and can be computed thousands of times faster than if computed using SPICE. These models were used to verify critical paths during the design of a 600MHz microprocessor.
expand
|
|
|
A top-down design environment for developing pipelined datapaths |
| |
Robert McGraw,
James H. Aylor,
Robert H. Klenke
|
|
Pages: 236-241 |
|
doi>10.1145/277044.277105 |
|
Full text: PDF
|
|
This paper presents a design environment for cycle-based systems, such as microprocessors, that permits modeling of these systems at various levels, from the abstract system level, through the detailed RTL level, to an actual implementation. ...
This paper presents a design environment for cycle-based systems, such as microprocessors, that permits modeling of these systems at various levels, from the abstract system level, through the detailed RTL level, to an actual implementation. The environment allows the models to be refined to lower levels in a step-wise manner. The environment provides the ability to obtain meaningful metrics from abstract models of a processor's architecture. This capability allows design alternatives to be evaluated earlier in the design cycle, thus eliminating costly redesign and reducing the processor time to market.
expand
|
|
|
Validation of an architectural level power analysis technique |
| |
Rita Yu Chen,
Robert M. Owens,
Mary Jane Irwin,
R. S. Bajwa,
Raminder S. Bajwa
|
|
Pages: 242-245 |
|
doi>10.1145/277044.277106 |
|
Full text: PDF
|
|
This paper presents a technique used to do po wer analysis of a real p rocessor at the architectural lev el. The target processor in tegrates a 16-bit DSP an d a 32-bit RISC on a single c hip. O ur po wer estimator pro vides po wer consumption data of ...
This paper presents a technique used to do po wer analysis of a real p rocessor at the architectural lev el. The target processor in tegrates a 16-bit DSP an d a 32-bit RISC on a single c hip. O ur po wer estimator pro vides po wer consumption data of the architecture based on the instruction/data flo w stream We demonstrat e the accuracy of the estimator by com paring the po wer valu es it p roduces against measurem en tsm adeby a gate level po wer sim ulator for th e same benc hmark set. Our estimation approac h has been shown to pro vide v ery efficient accurate pow er an alysis at the architectural level.
expand
|
|
|
Design methodology of a 200MHz superscalar microprocessor: SH-4 |
| |
Toshihiro Hattori,
Yusuke Nitta,
Mitsuho Seki,
Susumu Narita,
Kunio Uchiyama,
Tsuyoshi Takahashi,
Ryuichi Satomura
|
|
Pages: 246-249 |
|
doi>10.1145/277044.277108 |
|
Full text: PDF
|
|
A new design methodology focusing on high speed operation and short design time is described for the SH-4 200MHz superscalar microprocessor. Random test generation, logic emulation, and formal verification are applied to logic verification for shortening ...
A new design methodology focusing on high speed operation and short design time is described for the SH-4 200MHz superscalar microprocessor. Random test generation, logic emulation, and formal verification are applied to logic verification for shortening design time. Delay budgeting, forward/back annotation, and clock design are key features for timing driven design.
expand
|
|
|
How much analog does a designer need to know for successful mixed-signal design? (panel) |
| |
Stephan Ohr
|
|
Page: 250 |
|
doi>10.1145/277044.277112 |
|
Full text: PDF
|
|
|
|
|
Hierarchical algorithms for assessing probabilistic constraints on system performance |
| |
G. de Veciana,
M. Jacome,
J.-H. Guo
|
|
Pages: 251-256 |
|
doi>10.1145/277044.277113 |
|
Full text: PDF
|
|
We propose an algorithm for assessing probabilistic performance constraints for systems including components with uncertain delays. We make a case for designing systems based on a probabilistic relaxation of performance constraints, as this has the potential ...
We propose an algorithm for assessing probabilistic performance constraints for systems including components with uncertain delays. We make a case for designing systems based on a probabilistic relaxation of performance constraints, as this has the potential for resulting in lower silicon area and/or power consumption. We consider a concrete example, an MPEG decoder, for which we discuss modeling and assessment of probabilistic throughput constraints.
expand
|
|
|
A tool for performance estimation of networked embedded end-systems |
| |
Asawaree Kalavade,
Pratyush Moghé
|
|
Pages: 257-262 |
|
doi>10.1145/277044.277116 |
|
Full text: PDF
|
|
Networked embedded systems are expected to support adaptive streaming audio/video applications with soft real-time constraints. These systems can be designed in a cost efficient manner only if their architecture exploits the “leads” ...
Networked embedded systems are expected to support adaptive streaming audio/video applications with soft real-time constraints. These systems can be designed in a cost efficient manner only if their architecture exploits the “leads” suggested by clever compile-time performance estimators. However, performance estimation of networked embedded systems is a non-trivial problem. The computational requirements of such systems show statistical variations that stem from several interacting factors. At the slowest time scale, applications can adapt to network bandwidth by configuring the processing functionality of their tasks (e.g. compression parameters). Also, there could be significant execution time variations within a task. Thus, it is tricky to compute the net processing demand of several such applications on a system architecture, especially if the system schedules these applications using prioritized run-time schedulers.
In this paper, we describe an analytical tool called AsaP for fast performance estimation of such embedded systems. AsaP builds approximate models of these systems and characterizes the processing load on the system as a stochastic process. The output of AsaP is an exact distribution of the processing delay of each application. This is a powerful result that can be leveraged for efficient design of multimedia networked systems requiring soft real-time guarantees. It is also the first known framework that quantifies the effect of runtime schedulers (FCFS, RM, EDF) on the performance of such systems.
expand
|
|
|
Rate derivation and its applications to reactive, real-time embedded systems |
| |
Ali Dasdan,
Dinesh Ramanathan,
Rajesh K. Gupta
|
|
Pages: 263-268 |
|
doi>10.1145/277044.277118 |
|
Full text: PDF
|
|
An embedded system (the system) continuously interacts with its environment under strict timing constraints, called the external constraints, and it is important to know how these external constraints translate to time budgets, called the internal constraints, ...
An embedded system (the system) continuously interacts with its environment under strict timing constraints, called the external constraints, and it is important to know how these external constraints translate to time budgets, called the internal constraints, on the tasks of the system. Knowing these time budgets reduces the complexity of the system's design and validation problem and helps the designers have a simultaneous control on the system's functional as well as temporal correctness from the beginning of the design flow. The translation is carried out by first deriving the rate of each task in the system, hence the term “rate derivation”, using the system's task structure and the rates of the input stimuli coming into the system from its environment. The derived task rates are later used to derive and validate the rest of the internal as well as external constraints. This paper proposes a general task graph model to represent the system's task structure, techniques for deriving and validating the system's timing constraints, and a hardware/software codesign methodology that puts everything together.1
expand
|
|
|
Generic global placement and floorplanning |
| |
Hans Eisenmann,
Frank M. Johannes
|
|
Pages: 269-274 |
|
doi>10.1145/277044.277119 |
|
Full text: PDF
|
|
We present a new force directed method for global placement. Besides the well-known wire length dependent forces we use additional forces to reduce cell overlaps and to consider the placement area. Compared to existing approaches, the main advantage ...
We present a new force directed method for global placement. Besides the well-known wire length dependent forces we use additional forces to reduce cell overlaps and to consider the placement area. Compared to existing approaches, the main advantage is that the algorithm provides increased flexibility and enables a variety of demanding applications. Our algorithm is capable of addressing the problems of global placement, floorplanning, timing minimization and interaction to logic synthesis. Among the considered objective functions are area, timing, congestion and heat distribution. The iterative nature of the algorithm assures that timing requirements are precisely met. While showing similar CPU time requirements it outperforms Gordian by an average of 6 percent and TimberWolf by an average of 8 percent in wire length and yields significantly better timing results.
expand
|
|
|
Congestion driven quadratic placement |
| |
Phiroze N. Parakh,
Richard B. Brown,
Karem A. Sakallah
|
|
Pages: 275-278 |
|
doi>10.1145/277044.277121 |
|
Full text: PDF
|
|
This paper introduces and demonstrates an extension to quadratic placement that accounts for wiring congestion. The algorithm uses an A* router and line-probe heuristics on region-based routing graphs to compute routing cost. The ...
This paper introduces and demonstrates an extension to quadratic placement that accounts for wiring congestion. The algorithm uses an A* router and line-probe heuristics on region-based routing graphs to compute routing cost. The interplay between routing analysis and quadratic placement using a growth matrix permits global treatment of congestion. Further reduction in congestion is obtained by the relaxation of pin constraints. Experiments show improvements in wireability.
expand
|
|
|
Potential-NRG: placement with incomplete data |
| |
Maogang Wang,
Prithviraj Banerjee,
Majid Sarrafzadeh
|
|
Pages: 279-282 |
|
doi>10.1145/277044.277123 |
|
Full text: PDF
|
|
T raditional placement problems are studied under a fully specified cell library and a complete netlist. Ho w ev er, in the first, e.g., 2 years of a 2-3 year microprocessor design cycle, the detailed netlist is una vailable. F or area and performance ...
T raditional placement problems are studied under a fully specified cell library and a complete netlist. Ho w ev er, in the first, e.g., 2 years of a 2-3 year microprocessor design cycle, the detailed netlist is una vailable. F or area and performance estimation, layout must nev ertheless be done with incomplete information. Another source of incompleteness comes from reuse of instances from earlier design generations; these instances and their parameters will c hange as the project evolves. The problem of placement with incomplete data (PID) can be abstracted as ha ving to place a circuit when pn% of the nets are missing. The key challenge in PID is how to add missing cells and nets.
In this paper, tw o “patc hing-methods” for adding missing nets and cells are proposed. The methods are called abstraction and fusion.
Experimental results are v ery in teresting and illurstrative. First, they sho w that PID is a difficult problem and an arbitrary (and perhaps intuitiv ely sound) method may not produce high-quality results. Experiments verify that the abstraction method is a very good predictor and that fusion is not because circuits produced by abstraction attain much of the properties of the original circuits. Summary Table 3 in Section 4 shows that when a circuit has 10% incompleteness, abstraction can predict the final total wirelength with an error of 5.8%, while fusion has a 67.8% error in predicting the wirelength in the same circuit.
expand
|
|
|
Performance-driven multi-FPGA partitioning using functional clustering and replication |
| |
Wen-Jong Fang,
Allen C.-H. Wu
|
|
Pages: 283-286 |
|
doi>10.1145/277044.277125 |
|
Full text: PDF
|
|
This paper presents a new performance-driven partitioning method for multi-FPGA designs. The proposed method consists of three steps: (1) functional-cluster formation, (2) slack computation, and (3) set-covering-based partitioning with functional ...
This paper presents a new performance-driven partitioning method for multi-FPGA designs. The proposed method consists of three steps: (1) functional-cluster formation, (2) slack computation, and (3) set-covering-based partitioning with functional replication. The proposed method performs multi-FPGA partitioning by taking into account path delays and design structural information. We introduce a functional replication technique which performs circuit replications at the functional-cluster level instead of the cell level for delay and interconnect minimization. Experimental results on a number of benchmarks and industrial designs demonstrate that the proposed method achieves high-performance and high-density multi-FPGA partitions.
expand
|
|
|
Multi-pad power/ground network design for uniform distribution of ground bounce |
| |
Jaewon Oh,
Massoud Pedram
|
|
Pages: 287-290 |
|
doi>10.1145/277044.277128 |
|
Full text: PDF
|
|
This paper presents a method for power and ground (p/g) network routing for high speed CMOS chips with multiple p/g pads. Our objective is not to reduce the total amount of the ground bounce, but to distribute it more evenly among the pads while the ...
This paper presents a method for power and ground (p/g) network routing for high speed CMOS chips with multiple p/g pads. Our objective is not to reduce the total amount of the ground bounce, but to distribute it more evenly among the pads while the routing area is kept to a minimum. We first show that proper p/g terminal to pad assignment is necessary to reduce the maximum ground bounce and then present a heuristic for performing simultaneous assignment and p/g net routing. Experimental results demonstrate the effectiveness of our method.
expand
|
|
|
Layout extraction and verification methodology CMOS I/O circuits |
| |
Tong Li,
Sung-Mo Kang
|
|
Pages: 291-296 |
|
doi>10.1145/277044.277129 |
|
Full text: PDF
|
|
This paper presents a layout extraction and verification methodology which targets reliability-driven I/O design for CMOS VLSI chip, specifically to guard against electrostatic discharge (ESD) stress and latchup. We propose a new device extraction approach ...
This paper presents a layout extraction and verification methodology which targets reliability-driven I/O design for CMOS VLSI chip, specifically to guard against electrostatic discharge (ESD) stress and latchup. We propose a new device extraction approach to identify devices commonly used in CMOS I/O circuits including MOS transistors, field transistors, diffusion and well resistors, diodes and silicon controlled rectifiers (SCRs), etc. Unlike other extractors, our extractor identifies circuit-level netlist based on the specified ESD stress condition. In addition, novel techniques are proposed for the identification of parasitic bipolar junction transistors (BJTs).
expand
|
|
|
A mixed nodal-mesh formulation for efficient extraction and passive reduced-order modeling of 3D interconnects |
| |
Nuno Marques,
Mattan Kamon,
Jacob White,
L. Miguel Silveira
|
|
Pages: 297-302 |
|
doi>10.1145/277044.277132 |
|
Full text: PDF
|
|
As VLSI circuit speeds have increased, reliable chip and system design can no longer be performed without accurate three-dimensional interconnect models. In this paper, we describe an integral equation aproach to modeling the impedance of inter-connect ...
As VLSI circuit speeds have increased, reliable chip and system design can no longer be performed without accurate three-dimensional interconnect models. In this paper, we describe an integral equation aproach to modeling the impedance of inter-connect structures accounting for both the charge accumulation on the surface of conductors and the current traveling in their interior: Our formulation, based on a combination of nodal and mesh analysis, has the required properties to be combined with Model Order Reduction techniques to generate accurate and guaranteed passive low order Interconnect models for efficient inclusion in standard circuit simulators. Furthermore, the formulation is shown to be more flexible and efficient than previously reported methods.
expand
|
|
|
Layout based frequency dependent inductance and resistance extraction for on-chip interconnect timing analysis |
| |
Byron Krauter,
Sharad Mehrotra
|
|
Pages: 303-308 |
|
doi>10.1145/277044.277133 |
|
Full text: PDF
|
|
It is well understood that frequency independent lumped-element circuits can be used to accurately model proximity and skin effects in transmission lines [7]. Furthermore, it is also understood that these circuits can be synthesized knowing only the ...
It is well understood that frequency independent lumped-element circuits can be used to accurately model proximity and skin effects in transmission lines [7]. Furthermore, it is also understood that these circuits can be synthesized knowing only the high and the low frequency resistances and inductances [4]. Existing VLSI extraction tools however, are not efficient enough to solve for the frequency dependent resistances and inductances on large VLSI layouts, nor do they synthesize circuits suitable for timing analysis. We propose a rules-based method that efficiently and accurately captures the high and low frequency characteristics directly from layout shapes, and subsequently synthesizes a simple frequency independent ladder circuit suitable for timing analysis. We compare our results to other simulation results.
expand
|
|
|
A methodology for guided behavioral-level optimization |
| |
Lisa Guerra,
Miodrag Potkonjak,
Jan Rabaey
|
|
Pages: 309-314 |
|
doi>10.1145/277044.277134 |
|
Full text: PDF
|
|
Optimization at the early stages of design are crucial. However, due to an overwhelming number of design and optimization options, design exploration is often conducted in a qualitative, ad-hoc manner. This paper presents a methodology and interactive ...
Optimization at the early stages of design are crucial. However, due to an overwhelming number of design and optimization options, design exploration is often conducted in a qualitative, ad-hoc manner. This paper presents a methodology and interactive environment for guiding the exploration process. A prototype targeting behavioral-level optimization for datapath-intensive ASIC implementations has been developed. The key to the approach is encapsulated knowledge about the various optimizations and a set of techniques to automatically extract the “essence” of a design description. At each stage in the exploration process, the system suggests and ranks potential optimizations, both in terms of immediate and longer-term impact. It also provides evaluations of the design and of the likely affects each optimization will have on metrics like power and performance. In the new approach, the designer is responsible for making the actual optimization selections. However, using the provide guidance, designers can make decisions in a more informed manner, and therefore can explore the design solution space more effectively. The effectiveness of the approach is demonstrated on a number of designs.
expand
|
|
|
A programming environment for the design of complex high speed ASICs |
| |
Patrick Schaumont,
Serge Vernalde,
Luc Rijnders,
Marc Engels,
Ivo Bolsens
|
|
Pages: 315-320 |
|
doi>10.1145/277044.277135 |
|
Full text: PDF
|
|
A C++ based programming environment for the design of complex high speed ASICs is presented. The design of a 75 Kgate DECT transceiv er is used as a driv er example. Compact descriptions, combined with efficient sim ulationand syn thesis strategies are ...
A C++ based programming environment for the design of complex high speed ASICs is presented. The design of a 75 Kgate DECT transceiv er is used as a driv er example. Compact descriptions, combined with efficient sim ulationand syn thesis strategies are essen tial for the design of such a complex system. It is sho wn how a C++ programming approach outperforms traditional HDL-based methods.
expand
|
|
|
Media architecture: general purpose vs. multiple application-specific programmable processor |
| |
Chunho Lee,
Johnson Kin,
Miodrag Potkonjak,
William H. Mangione-Smith
|
|
Pages: 321-326 |
|
doi>10.1145/277044.277136 |
|
Full text: PDF
|
|
In this paper we report a framework that makes it possible for a designer to rapidly explore the application-specific programmable processor design space under area constraints. The framework uses a production-quality compiler and simulation tools to ...
In this paper we report a framework that makes it possible for a designer to rapidly explore the application-specific programmable processor design space under area constraints. The framework uses a production-quality compiler and simulation tools to synthesize a high performance machine for an application. Using the framework we evaluate the validity of the fundamental assumption behind the development of application-specific programmable processors. Application-specific processors are based on the idea that applications differ from each other in key architectural parameters, such as the available instruction-level parallelism, demand on various hardware components (e.g. cache memory units, register files) and the need for different number of functional units. We found that the framework introduced in this paper can be valuable in making early design decisions such as area and architectural trade-off, cache and instruction issue width trade-off under area constraint, and the number of branch units and issue width.
expand
|
|
|
User experience with high level formal verification (panel) |
| |
Randy E. Bryant,
G. Musgrave
|
|
Page: 327 |
|
doi>10.1145/277044.277137 |
|
Full text: PDF
|
|
Formal Verification is a “hot topic” for the user and vendor community. It has moved from the research community to the industrial domain in a very short time. Everyone wants to know more about how effective the techniques are. This experienced ...
Formal Verification is a “hot topic” for the user and vendor community. It has moved from the research community to the industrial domain in a very short time. Everyone wants to know more about how effective the techniques are. This experienced user panel will attempt to address your concerns in an open and frank way. They will give their personal opinions and not the commercial hype that so often heralds a new era.
expand
|
|
|
What's between simulation and formal verification? (extended abstract) |
| |
David L. Dill
|
|
Pages: 328-329 |
|
doi>10.1145/277044.277138 |
|
Full text: PDF
|
|
This embedded tutorial surveys some possibilities for verification techniques that combine conventional simulation and ideas, techniques, and algorithms from formal verification, to obtain better functional test coverage of large designs.
This embedded tutorial surveys some possibilities for verification techniques that combine conventional simulation and ideas, techniques, and algorithms from formal verification, to obtain better functional test coverage of large designs.
expand
|
|
|
Optimal FPGA mapping and retiming with efficient initial state computation |
| |
Jason Cong,
Chang Wu
|
|
Pages: 330-335 |
|
doi>10.1145/277044.277139 |
|
Full text: PDF
|
|
For sequential circuits with given initial states, new equivalent initial states must be computed for retiming, which unfortunately is NP-hard. In this paper we propose a novel polynomial time algorithm for optimal FPGA mapping with forward ...
For sequential circuits with given initial states, new equivalent initial states must be computed for retiming, which unfortunately is NP-hard. In this paper we propose a novel polynomial time algorithm for optimal FPGA mapping with forward retiming to minimize the clock period with guaranteed initial state computation. It enables a new methodology of separating forward retiming from backward retiming to avoid time-consuming iterations between retiming and initial state computation. Our algorithm compares very favorably with both of the conventional approaches of separate mapping followed by retiming [1, 8] and the recent approaches of combined mapping with retiming [12, 2]. It is also applicable to circuits with partial initial state assignment.
expand
|
|
|
M32: a constructive multilevel logic synthesis system |
| |
Victor N. Kravets,
Karem A. Sakallah
|
|
Pages: 336-341 |
|
doi>10.1145/277044.277140 |
|
Full text: PDF
|
|
We describe a new constructive multilevel logic synthesis system that integrates the traditionally separate technology-independent and technology-dependent stages of modern synthesis tools. Dubbed M32, this system is capable of generating circuits ...
We describe a new constructive multilevel logic synthesis system that integrates the traditionally separate technology-independent and technology-dependent stages of modern synthesis tools. Dubbed M32, this system is capable of generating circuits incrementally based on both functional as well as structural considerations. This is achieved by maintaining a dynamic structural representation of the evolving implementation and by refining it through progressive introduction of gates from a target technology library. Circuit construction proceeds from the primary inputs towards the primary outputs. Preliminary experimental results show that circuits generated using this approach are generally superior to those produced by multi-stage synthesis.
expand
|
|
|
Efficient Boolean division and substitution |
| |
Shih-Chieh Chang,
David Ihsin Cheng
|
|
Pages: 342-347 |
|
doi>10.1145/277044.277141 |
|
Full text: PDF
|
|
Bo ole andivision, and henc eBo ole ansubstitution, produc es better result than algebraic division and substitution. However, due to the lack of an efficient Bo ole andivision algorithm, Bo ole ansubstitution has rarely b een used. We present ...
Bo ole andivision, and henc eBo ole ansubstitution, produc es better result than algebraic division and substitution. However, due to the lack of an efficient Bo ole andivision algorithm, Bo ole ansubstitution has rarely b een used. We present an efficient Bo ole andivision and substitution algorithm. Our technique is based on the philosophy of redundancy addition and removal. By adding multiple wires/gates in a specialized way, we tailor the philosophy onto the Bo ole an division and substitution problem. F rom the viewpoint of traditional division/substitution, our algorithm can perform substitution not only in sum-of-product form for but also in product-of-sum form. Our algorithm can also naturally take all typ es of don't cares into consideration. As far as substitution is conc erne d, we also discuss the case where we are allowed to decompose not only the dividend but also the divisor. Experiments are presente d and the result is pr omising.
expand
|
|
|
Delay-optimal technology mapping by DAG covering |
| |
Yuji Kukimoto,
Robert K. Brayton,
Prashant Sawkar
|
|
Pages: 348-351 |
|
doi>10.1145/277044.277142 |
|
Full text: PDF
|
|
We propose an algorithm for minimal-delay technology mapping for library-based designs. We show that subject graphs need not be decomposed into trees for delay minimization; they can be mapped directly as DAGs. Experimental results demonstrate that significant ...
We propose an algorithm for minimal-delay technology mapping for library-based designs. We show that subject graphs need not be decomposed into trees for delay minimization; they can be mapped directly as DAGs. Experimental results demonstrate that significant delay improvement is possible by this new approach.
expand
|
|
|
A fast fanout optimization algorithm for near-continuous buffer libraries |
| |
David S. Kung
|
|
Pages: 352-355 |
|
doi>10.1145/277044.277143 |
|
Full text: PDF
|
|
This paper presents a gain-based fanout optimization algorithm for near-continuous buffer libraries. A near-continuous buffer library contains many buffers in a wide range of discrete sizes and each buffer of a specific type satisfies a size-independent ...
This paper presents a gain-based fanout optimization algorithm for near-continuous buffer libraries. A near-continuous buffer library contains many buffers in a wide range of discrete sizes and each buffer of a specific type satisfies a size-independent delay equation. The new fanout algorithm is derived from an optimal algorithm to a special fanout optimization problem for continuous libraries. The gain-based technique constructs fanout trees which have better timing at similar area cost. Since no combinatorial search over buffer sizes or fanout tree topologies is used, our execution time is up to 1000 times faster when compared to conventional fanout algorithms.
expand
|
|
|
Performance driven multi-layer general area routing for PCB/MCM designs |
| |
Jason Cong,
Patrick H. Madden
|
|
Pages: 356-361 |
|
doi>10.1145/277044.277144 |
|
Full text: PDF
|
|
In this paper we present a new global router appropriate for Multichip Module (MCM) and dense Printed Circuit Board (PCB) design, which utilizes a hybrid of the classical rip-up and reroute approach, and the more recent iterative deletion [9] method. ...
In this paper we present a new global router appropriate for Multichip Module (MCM) and dense Printed Circuit Board (PCB) design, which utilizes a hybrid of the classical rip-up and reroute approach, and the more recent iterative deletion [9] method. The global router addresses performance issues by utilizing recent results in high performance interconnect design, while still effectively minimizing global congestion.
With experimen ts on the maze-routing component of our global router, we show that the choice of routing cost functions can have a significant impact on final solution quality. The results of a number of previously proposed routers may be improved dramatically by adopting the cost functions we suggest here. W e also find little evidence of the “net ordering problem” when our cost functions and routing model are applied. The iterative deletion method is shown to improve global solution quality, particularly when high performance interconnect is required. We evaluate the performance of our global router by comparing the congestion of routes produced by our global router to those of a well known MCM router, V4R [14].
Our global router, MINOTAUR, supports arbitrary numbers of routing layers, differing capacities for each layer, pre-existing congestion and obstacles, and high performance interconnect structures (including those which require variable width interconnect).
expand
|
|
|
Buffer insertion for noise and delay optimization |
| |
Charles J. Alpert,
Anirudh Devgan,
Stephen T. Quay
|
|
Pages: 362-367 |
|
doi>10.1145/277044.277145 |
|
Full text: PDF
|
|
Buffer insertion has successfully been applied to reduce delay in global interconnect paths; however, existing techniques only optimize delay and timing slack. With the increasing ratio of coupling to total capacitance and the use of aggressive ...
Buffer insertion has successfully been applied to reduce delay in global interconnect paths; however, existing techniques only optimize delay and timing slack. With the increasing ratio of coupling to total capacitance and the use of aggressive dynamic logic circuit families, noise is becoming a major design bottleneck. We present comprehensive buffer insertion techniques for noise and delay optimization. Our experiments on a microprocessor design show that our approach fixes all noise violations that were identified by a detailed, simulation-based noise analysis tool. Further, we show that the performance penalty induced by optimizing both delay and noise as opposed to only delay is 2%.
expand
|
|
|
Table-lookup methods for improved performance-driven routing |
| |
John Lillis,
Premal Buch
|
|
Pages: 368-373 |
|
doi>10.1145/277044.277146 |
|
Full text: PDF
|
|
The inaccuracy of Elmore delay [3] for inter conne ct delay estimation is well-documented. However it remains a popular delay measur e to drive p erformance optimization procedur es such as wire-sizing and topolo gy construction. This p aper ...
The inaccuracy of Elmore delay [3] for inter conne ct delay estimation is well-documented. However it remains a popular delay measur e to drive p erformance optimization procedur es such as wire-sizing and topolo gy construction. This p aper studies the merits of incorp orating “b etter-than-Elmore” delay measur es into the optimization process. The proposed delay metrics use a table-lo okup method to incorporatebetter load modeling and approximate the effect of signal slew. We demonstrate that the proposed metrics exhibit a much narrower error distribution than Elmore delay, eliminating Elmore's frequent gross delay over-estimation. Finally we show the improvement in solution quality which can b e had by inc orp orating the new metrics into a timing driven topology construction algorithm.
expand
|
|
|
Global routing with crosstalk constraints |
| |
Hai Zhou,
D. F. Wong
|
|
Pages: 374-377 |
|
doi>10.1145/277044.277147 |
|
Full text: PDF
|
|
Due to the scaling down of device geometry and increasing frequency in deep sub-micron designs, crosstalk between interconnection wires has become an important issue in VLSI layout design. In this paper, we consider crosstalk avoidance during global ...
Due to the scaling down of device geometry and increasing frequency in deep sub-micron designs, crosstalk between interconnection wires has become an important issue in VLSI layout design. In this paper, we consider crosstalk avoidance during global routing. W e present a global routing algorithm based on a new Steiner tree formulation and the Lagrangian relaxation technique. W e also give theoretical results on the complexity of the problem.
expand
|
|
|
Timing and crosstalk driven area routing |
| |
Hsiao-Ping Tseng,
Louis Scheffer,
Carl Sechen
|
|
Pages: 378-381 |
|
doi>10.1145/277044.277148 |
|
Full text: PDF
|
|
We present a timing and crosstalk driven router for the chip assembly task that is applied between global and detailed routing. Our new approach aims to process the crosstalk and timing constraints by ordering nets and tuning wire spacing in a quantitative ...
We present a timing and crosstalk driven router for the chip assembly task that is applied between global and detailed routing. Our new approach aims to process the crosstalk and timing constraints by ordering nets and tuning wire spacing in a quantitative way. Our graph-based optimizer preroutes wires on the global routing grids incrementally in two stages - net order assignment and space relaxation. The timing delay of each critical path is calculated taking into account interconnect coupling capacitance. The objective is to reduce the delays of critical nets with negative timing slack values, by tuning net ordering and adding extra wire spacing. It shows a remarkable 8.4-25% delay reduction for MCNC benchmarks for wire geometric ratio=2.0, against a 33% delay reduction if interconnect interference disappear.
expand
|
|
|
Process multi-circuit optimization |
| |
Arun Lokanathan,
Jay Brockman
|
|
Pages: 382-387 |
|
doi>10.1145/277044.277149 |
|
Full text: PDF
|
|
This paper describes the implementation of a concurrent methodology for integrated circuit optimization that spans the fabrication process design and circuit design disciplines. Results from this methodology show substantial performance gains as compared ...
This paper describes the implementation of a concurrent methodology for integrated circuit optimization that spans the fabrication process design and circuit design disciplines. Results from this methodology show substantial performance gains as compared to a separate process/circuit optimization and a substantial time savings over an “all-at-once” combined optimization.
expand
|
|
|
Migration: a new technique to improve synthesized designs through incremental customization |
| |
Rajendran Panda,
Abhijit Dharchoudhury,
Tim Edwards,
Joe Norton,
David Blaauw
|
|
Pages: 388-391 |
|
doi>10.1145/277044.277150 |
|
Full text: PDF
|
|
A novel technique to explore the performance vs design effort trade-off is proposed. Starting from an optimally synthesized design, performance-critical cells are incrementally and optimally selected and custom-sized to generate this ...
A novel technique to explore the performance vs design effort trade-off is proposed. Starting from an optimally synthesized design, performance-critical cells are incrementally and optimally selected and custom-sized to generate this trade-off. Efficient algorithms for the optimal selection, and for improving the reuse of custom-sized cells in the design are given. Significant performance gains are shown in several real circuits through the addition of very few customized cells.
expand
|
|
|
A practical repeater insertion method in high speed VLSI circuits |
| |
Julian Culetu,
Chaim Amir,
John MacDonald
|
|
Pages: 392-395 |
|
doi>10.1145/277044.277151 |
|
Full text: PDF
|
|
In today's design of VLSI high speed circuits, frequency has a major impact on the number of repeaters that needs to be inserted. A microprocessor operating at less than 200Mhz might require several hundred repeaters, while one operating at greater than ...
In today's design of VLSI high speed circuits, frequency has a major impact on the number of repeaters that needs to be inserted. A microprocessor operating at less than 200Mhz might require several hundred repeaters, while one operating at greater than 500Mhz may require a number in the thousands. The following paper describes an efficient and simple way to automatically determine buffer placement based on maintaining equal transition time for all gate input signals across the net. A maximum allowable transition time is determined (limited by the frequency of the circuit), and correlated with the interconnect Elmore Delay. A Spice RC model having nodes with physical locations (X, Y coordinates) can be obtained by extraction tools providing standard parasitic format (SPF). This can then be used with the results of the algorithm for repeater placement to determine the exact physical location desired for each repeater.
expand
|
|
|
Practical experiences with standard-cell based datapath design tools: do we really need regular layouts? |
| |
Paolo Ienne,
Alexander Grießing
|
|
Pages: 396-401 |
|
doi>10.1145/277044.277152 |
|
Full text: PDF
|
|
Commercial tools for standard-cell based datapath design are here classed according to design flows, and the advantages of each class are discussed with the results of two test circuits. Algorithmic generation of netlists and of relative cell ...
Commercial tools for standard-cell based datapath design are here classed according to design flows, and the advantages of each class are discussed with the results of two test circuits. Algorithmic generation of netlists and of relative cell placement can help reducing area but, contrary to common belief, appears often detrimental to speed. Extraction of regularity from synthesized netlists is difficult and requires counterproductive simplifications to the synthesis process. Most promising are synthesis tools which can generate placement data; yet, no tool of this class appears ready today.
expand
|
|
|
A statistical performance simulation methodology for VLSI circuits |
| |
Michael Orshansky,
James C. Chen,
Chenming Hu
|
|
Pages: 402-407 |
|
doi>10.1145/277044.277153 |
|
Full text: PDF
|
|
A statistical performance simulation (SPS) methodology for VLSI circuits is presented. Traditional methods of worst-case corner analysis lack accuracy and Monte-Carlo simulations cannot be applied to VLSI circuits because of their complexity. SPS methodology ...
A statistical performance simulation (SPS) methodology for VLSI circuits is presented. Traditional methods of worst-case corner analysis lack accuracy and Monte-Carlo simulations cannot be applied to VLSI circuits because of their complexity. SPS methodology is accurate because no statistical information about the device parameter variation is lost. It achieves efficiency by analyzing the smaller circuit blocks and generating the performance distribution for the entire circuit. Circuit evaluation at any specified performance level is possible.
expand
|
|
|
RF IC design challenges |
| |
Behzad Razavi
|
|
Pages: 408-413 |
|
doi>10.1145/277044.277154 |
|
Full text: PDF
|
|
This paper describes the challenges in designing RF integrated circuits for wireless transceiver applications. Receiver architectures such as heterodyne, homodyne, and image-reject topologies are presented and two transmitter architectures, namely, one-step ...
This paper describes the challenges in designing RF integrated circuits for wireless transceiver applications. Receiver architectures such as heterodyne, homodyne, and image-reject topologies are presented and two transmitter architectures, namely, one-step and two-step configurations are studied. The design of building blocks such as low-noise amplifiers, mixers, and oscillators is also considered.
expand
|
|
|
Tools and methodology for RF IC design |
| |
Al Dunlop,
Alper Demir,
Peter Feldmann,
Sharad Kapur,
David Long,
Robert Melville,
Jaijeet Roychowdhury
|
|
Pages: 414-420 |
|
doi>10.1145/277044.277155 |
|
Full text: PDF
|
|
We describe powerful new techniques for the analysis of RF circuits. Next-generation CAD tools based on such techniques should enable RF designers to obtain a more accurate picture of how their circuits will operate. These new simulation capabilities ...
We describe powerful new techniques for the analysis of RF circuits. Next-generation CAD tools based on such techniques should enable RF designers to obtain a more accurate picture of how their circuits will operate. These new simulation capabilities will be essential in order to reduce the number of design iterations needed to produce complex RFICs.
expand
|
|
|
Electromagnetic modeling and signal integrity simulation of power/ground networks in high speed digital packages and printed circuit boards |
| |
Frank Y. Yuan
|
|
Pages: 421-426 |
|
doi>10.1145/277044.277164 |
|
Full text: PDF
|
|
The electromagnetic modeling and parameter extraction of digital packages and PCB boards for system signal integrity applications are presented. A systematic approach to analyze complex power/ground structures and simulate their effects on digital systems ...
The electromagnetic modeling and parameter extraction of digital packages and PCB boards for system signal integrity applications are presented. A systematic approach to analyze complex power/ground structures and simulate their effects on digital systems is developed. First, an integral equation boundary element algorithm is applied to the electromagnetic modeling of the PCB structures. Then, equivalent circuits of the power/ground networks are extracted from the EM solution. In an integrated simulation scheme, the equivalent circuits are combined with signal nets, package models, device circuits, and other external circuitry for system level signal integrity analysis and simulation. This methodology has been implemented as software tools and applied to practical design problems. Effects related to power/ground networks, such as simultaneous switching noises, crosstalk, and ground discontinuity are analyzed for realistic designs.
expand
|
|
|
Efficient coloring of a large spectrum of graphs |
| |
Darko Kirovski,
Miodrag Potkonjak
|
|
Pages: 427-432 |
|
doi>10.1145/277044.277165 |
|
Full text: PDF
|
|
We have developed a new algorithm and software for graph coloring by systematically combining several algorithm and software development ideas that had crucial impact on the algorithm's performance. The algorithm explores the divide-and-conquer ...
We have developed a new algorithm and software for graph coloring by systematically combining several algorithm and software development ideas that had crucial impact on the algorithm's performance. The algorithm explores the divide-and-conquer paradigm, global search for constrained independent sets using a computationally inexpensive objective function, assignment of most-constrained vertices to least-constraining colors, reuse and locality exploration of intermediate solutions, search time management, post-processing lottery-scheduling iterative improvement, and statistical parameter determination and validation. The algorithm was tested on a set of real-life examples. We found that hard-to-color real-life examples are common especially in domains where problem modeling results in denser graphs. Systematic experimentations demonstrated that for numerous instances the algorithm outperformed all other implementations reported in literature in solution quality and run-time.
expand
|
|
|
Arithmetic optimization using carry-save-adders |
| |
Taewhan Kim,
William Jao,
Steve Tjiang
|
|
Pages: 433-438 |
|
doi>10.1145/277044.277166 |
|
Full text: PDF
|
|
Carry-save-adder(CSA) is the most often used type of operation in implementing a fast computation of arithmetics of register-transfer level design in industry. This paper establishes a relationship between the properties of arithmetic computations and ...
Carry-save-adder(CSA) is the most often used type of operation in implementing a fast computation of arithmetics of register-transfer level design in industry. This paper establishes a relationship between the properties of arithmetic computations and several optimizing transformations using CSAs to derive consistently better qualities of results than those of manual implementations. In particular, we introduce two important concepts, operation-duplication and operation-split, which are the main driving techniques of our algorithm for achieving an extensive utilization of CSAs. Experimental results from a set of typical arithmetic computations found in industry designs indicate that automating CSA optimization with our algorithm produces designs with significantly faster timing and less area.
expand
|
|
|
Synthesis of power-optimized and area-optimized circuits from hierarchical behavioral descriptions |
| |
Ganesh Lakshminarayana,
Niraj K. Jha
|
|
Pages: 439-444 |
|
doi>10.1145/277044.277167 |
|
Full text: PDF
|
|
We present a technique for synthesizing power- as well as area-optimized circuits from hierarchical data flow graphs under throughput constraints. We allow for the use of complex RTL modules, such as FFTs and filters, as building blocks for the RTL circuit, ...
We present a technique for synthesizing power- as well as area-optimized circuits from hierarchical data flow graphs under throughput constraints. We allow for the use of complex RTL modules, such as FFTs and filters, as building blocks for the RTL circuit, in addition to simple RTL modules such as adders and multipliers. Unlike past techniques in the area, we also customize the complex RTL modules to match the environment in which they find themselves. We present a fast and efficient algorithm for mapping multiple behaviors onto the same RTL module during the course of synthesis, thus allowing our synthesis system to explore previously unexplored regions of the design space. These techniques are at the core of an iterative improvement based approach which can accept temporary degradation in solution quality in its quest for a globally optimal solution. The moves in our iterative improvement procedure explore optimizations along different dimensions such as functional unit selection, resource allocation, resource sharing, resource splitting, and selection and resynthesis of complex RTL modules. These inter-related optimizations are dynamically traded off with each other during the course of synthesis, thus exploiting the benefits that arise from their interaction. The synthesis framework also tackles other related high-level synthesis tasks such as scheduling, clock selection, and Vdd selection. Experimental results demonstrate that our algorithm produces circuits whose area and power consumption are comparable to or better than those produced using flattened synthesis, within much shorter CPU times. The efficacy of our algorithm in the power-optimization mode is illustrated by the fact that it produces circuits that consume upto 6.7 times less power than area-optimized circuits working at 5 Volts at area overheads not exceeding 50%.
expand
|
|
|
Approximation and decomposition of binary decision diagrams |
| |
Kavita Ravi,
Kenneth L. McMillan,
Thomas R. Shiple,
Fabio Somenzi
|
|
Pages: 445-450 |
|
doi>10.1145/277044.277168 |
|
Full text: PDF
|
|
Efficient techniques for the manipulation of Binary Decision Diagrams (BDDs) are key to the success of formal verification tools. Recent advances in reachability analysis and model checking algorithms have emphasized the need for efficient algorithms ...
Efficient techniques for the manipulation of Binary Decision Diagrams (BDDs) are key to the success of formal verification tools. Recent advances in reachability analysis and model checking algorithms have emphasized the need for efficient algorithms for the approximation and decomposition of BDDs. In this paper we present a new algorithm for approximation and analyze its performance in comparison with existing techniques. We also introduce a new decomposition algorithm that produces balanced partitions. The effectiveness of our contributions is demonstrated by improved results in reachability analysis for some hard problem instances.
expand
|
|
|
Approximate reachability with BDDs using overlapping projections |
| |
Shankar G. Govindaraju,
David L. Dill,
Alan J. Hu,
Mark A. Horowitz
|
|
Pages: 451-456 |
|
doi>10.1145/277044.277169 |
|
Full text: PDF
|
|
Approximate reachability tec hniques trade off accuracy with the capacity to deal with bigger designs. Cho et al [3] proposed approximate FSM traversal algorithms over a partition of the set of state bits. In this paper w egeneralize ...
Approximate reachability tec hniques trade off accuracy with the capacity to deal with bigger designs. Cho et al [3] proposed approximate FSM traversal algorithms over a partition of the set of state bits. In this paper w egeneralize it by allowing projectionson to a collection of nondisjoint subsets of the state variables. We establish the adv an tageof ha ving overlapping projections and present a new multiple constr ainfunction for BDDs, to compute efficiently the approximate image during symbolic forward propagation using overlapping projections. We demonstrate the effectiveness of this new algorithm by applying it to several control modules from the I/O unit in the Stanford FLASH Multiprocessor. We also present our results on the larger ISCAS 89 benchmarks.
expand
|
|
|
Incremental CTL model checking using BDD subsetting |
| |
Abelardo Pardo,
Gary D. Hachtel
|
|
Pages: 457-462 |
|
doi>10.1145/277044.277171 |
|
Full text: PDF
|
|
An automatic abstraction/refinement algorithm for symbolic CTL model checking is presented. Conservative model checking is thus done for the full CTL language-no restriction is made to the universal or existen tial fragments. The algorithm begins with ...
An automatic abstraction/refinement algorithm for symbolic CTL model checking is presented. Conservative model checking is thus done for the full CTL language-no restriction is made to the universal or existen tial fragments. The algorithm begins with conserv ativ everification of an initial abstraction. If the conclusion is negativ e,it deriv es a “goal set” of states which require further resolution. It then successiv ely refines, with respect to this goal set, the appro ximations made in the sub-formulas, until the giv en form ula is v erified or computational resources are exhausted. This method applies uniformly to the abstractions based in over-appro ximation as well as under-approximations of the model. Both the refinement and the abstraction procedures are based in BDD-subsetting. Note that refinement procedures which are based on error traces, are limited to over-appro ximation on the universal fragment (or for language con tainment), whereas the goal set method is applicable to all consisten t appro ximations, and for all CTL formulas.
expand
|
|
|
PRIMO: probability interpretation of moments for delay calculation |
| |
Rony Kay,
Lawrence Pileggi
|
|
Pages: 463-468 |
|
doi>10.1145/277044.277172 |
|
Full text: PDF
|
|
Moments of the impulse response are widely used for interconnect delay analysis, from the explicit Elmore delay (first moment of the impulse response) expression, to moment matching methods which create reduced order transimpedance and transfer ...
Moments of the impulse response are widely used for interconnect delay analysis, from the explicit Elmore delay (first moment of the impulse response) expression, to moment matching methods which create reduced order transimpedance and transfer function approximations. However, the Elmore delay is fast becoming ineffective for deep submicron technologies, and reduced order transfer function delays are impractical for use as early-phase design metrics or as design optimization cost functions. This paper describes an approach for fitting moments of the impulse response to probability density functions so that delays can be estimated from probability tables. For RC trees it is demonstrated that the incomplete gamma function provides a provably stable approximation. The step response delay is obtained from a one-dimensional table lookup.
expand
|
|
|
ftd: an exact frequency to time domain conversion for reduced order RLC interconnect models |
| |
Ying Liu,
Lawrence T. Pileggi,
Andrzej J. Strojwas
|
|
Pages: 469-472 |
|
doi>10.1145/277044.277174 |
|
Full text: PDF
|
|
Recursive convolution provides an exact solution for interfacing reduced-order frequency domain representations with discrete time domain models of piecewise linear voltage waveforms. The state-space method is more efficient, but not exact, and ...
Recursive convolution provides an exact solution for interfacing reduced-order frequency domain representations with discrete time domain models of piecewise linear voltage waveforms. The state-space method is more efficient, but not exact, and can sometimes produce large time domain errors. This paper presents a new algorithm, ftd (frequency to time domain), for incorporating linear frequency domain macro-models into time domain simulators. ftd provides accuracy equivalent to recursive convolution with efficiency that is superior to the state-space methods.
expand
|
|
|
Extending moment computation to 2-port circuit representations |
| |
Fang-Jou Liu,
Chung-Kuan Cheng
|
|
Pages: 473-476 |
|
doi>10.1145/277044.277176 |
|
Full text: PDF
|
|
In this paper, we present an extension of moment computation to 2-port circuits. Our formulas are applicable to both transfer function moments and driving-point admittance moments. Given the input admittances, output admittances, and transfer functions ...
In this paper, we present an extension of moment computation to 2-port circuits. Our formulas are applicable to both transfer function moments and driving-point admittance moments. Given the input admittances, output admittances, and transfer functions of two 2-ports, our formulas compute the input admittance, output admittance, and transfer function when these 2-ports are combined either in parallel or in series. A nice conclusion of our work is the discovery our formulas form an elegant framework integrating the results from two classical papers, Rubinstein et al. & O'Brien and Savarino, for computing the Elmore delay and driving-point admittance moments in RC trees.
expand
|
|
|
Adjoint transient sensitivity computation in piecewise linear simulation |
| |
Tuyan V. Nguyen,
Anirudh Devgan,
Ognen J. Nastov
|
|
Pages: 477-482 |
|
doi>10.1145/277044.277177 |
|
Full text: PDF
|
|
This paper presents a general method for computing transient sensitivities using the adjoint method in event driven simulation algorithms that employ piecewise linear device models. Sensitivity information provides first order assessment of circuit ...
This paper presents a general method for computing transient sensitivities using the adjoint method in event driven simulation algorithms that employ piecewise linear device models. Sensitivity information provides first order assessment of circuit variability with respect to design variables and parasitics. This information is particularly useful for noise stability analysis, timing rule generation, and circuit optimization. Techniques for incorporating adjoint transient sensitivity into ACES, a general piecewise linear simulator, are presented. Sensitivity computation includes algorithms to handle instantaneous charge redistribution due to the discontinuous conductance models of the piecewise linear elements, and the loss of simulation accuracy due to the non-monotonous responses in autonomous adjoint circuits with non-zero initial conditions. Results demonstrate the efficiency and accuracy of the proposed techniques.
expand
|
|
|
Design methodology of ultra low-power MPEG4 codec core exploiting voltage scaling techniques |
| |
Kimiyoshi Usami,
Mutsunori Igarashi,
Takashi Ishikawa,
Masahiro Kanazawa,
Masafumi Takahashi,
Mototsugu Hamada,
Hideho Arakida,
Toshihiro Terazawa,
Tadahiro Kuroda
|
|
Pages: 483-488 |
|
doi>10.1145/277044.277178 |
|
Full text: PDF
|
|
This paper describes a fully automated low-power design methodology in which three different voltage-scaling techniques are combined together. Supply voltage is scaled globally, selectively, and adaptively while keeping the performance. This methodology ...
This paper describes a fully automated low-power design methodology in which three different voltage-scaling techniques are combined together. Supply voltage is scaled globally, selectively, and adaptively while keeping the performance. This methodology enabled us to design an MPEG4 codec core with 58% less power than the original in three week turn-around-time.
expand
|
|
|
Design and optimization of low voltage high performance dual threshold CMOS circuits |
| |
Liqiong Wei,
Zhanping Chen,
Mark Johnson,
Kaushik Roy,
Vivek De
|
|
Pages: 489-494 |
|
doi>10.1145/277044.277179 |
|
Full text: PDF
|
|
Reduction in leakage power has become an important concern in low voltage, low power and high performance applications. In this paper, we use dual threshold technique to reduce leakage power by assigning high threshold voltage to some transistors in ...
Reduction in leakage power has become an important concern in low voltage, low power and high performance applications. In this paper, we use dual threshold technique to reduce leakage power by assigning high threshold voltage to some transistors in non-critical paths, and using low-threshold transistors in critical paths. In order to achieve the best leakage power saving under target performance constraints, an algorithm is presented for selecting and assigning an optimal high threshold voltage. A general standby leakage current model which has been verified by IISPICE is used to estimate standby leakage power. Results show that dual threshold technique is good for power reduction during both standby and active modes. The standby leakage power savings for some ISCAS benchmarks can be more than 50%.
expand
|
|
|
MTCMOS hierarchical sizing based on mutual exclusive discharge patterns |
| |
James Kao,
Siva Narendra,
Anantha Chandrakasan
|
|
Pages: 495-500 |
|
doi>10.1145/277044.277180 |
|
Full text: PDF
|
|
Multi-threshold CMOS is a popular circuit style that will provide high performance and low power operation. Optimally sizing the gating sleep transistor to provide adequate performance is difficult because the overall delay characteristics are strongly ...
Multi-threshold CMOS is a popular circuit style that will provide high performance and low power operation. Optimally sizing the gating sleep transistor to provide adequate performance is difficult because the overall delay characteristics are strongly dependent on the discharge patterns of internal gates. This paper proposes a methodology for sizing the sleep transistor for a large module based on mutual exclusive discharge patterns of internal blocks. This algorithm can be applied at all levels of a circuit hierarchy, where the internal blocks can represent transistors, cells within an array, or entire modules. This methodology will give an upper bound for the sleep transistor size required to meet any performance constraint.
expand
|
|
|
Technical challenges of IP and system-on-chip (panel): the ASIC vendor perspective |
| |
A. Richard Newton
|
|
Page: 501 |
|
doi>10.1145/277044.277181 |
|
Full text: PDF
|
|
The vision of easily accessible IP that can be quickly integrated on silicon as “virtual components” is a compelling one, with deep implications for reuse methodology and EDA technology. Activities of the VSI Alliance, starting nearly two ...
The vision of easily accessible IP that can be quickly integrated on silicon as “virtual components” is a compelling one, with deep implications for reuse methodology and EDA technology. Activities of the VSI Alliance, starting nearly two years ago, have fueled interest in IP and raised market expectations for value and reusability to very high levels. Indeed, the sheer number of new IP companies, in combination with third-party ASIC libraries and EDA tools offerings, suggests that the era of plug and play IP has arrived. Today, the claimed independence of IP from the underlying silicon has led to bold market claims for foundry manufactured system chips using internet sourced IP with Lego block, snap-together integration simplicity. Traditional roles of foundries, ASIC suppliers, and EDA vendors are blurring. To the end customer, the situation presents both opportunities and risks. Are we on the cusp of a new era, or a rude awakening?
From the perspective of leading ASIC vendors, the evolution of IP-based systems on silicon, along with the means and methods of producing them, are not new. Full use of rapidly-increasing raw ASIC gate counts — which is synonymous with systems on silicon — has been a driving force in the ASIC industry for many years. As developers of high-value IP and systems architectures, the leading ASIC companies have an established record of experience with design and system chip level integration. This experience — e.g., with system-level analysis, integration of cores designed to common bus-structures, etc. — can be leveraged toward practical realization of an IP vision that offers the silicon consumer more choice without unacceptable implementation risks. On the other hand, from the perspective of a pure IP integrator, IP is poised to change the way customers and ASIC vendors themselves produce systems-on-silicon, just as vendor-specific tools evolved to commercial CAE starting a decade ago. Early IP success stories are pointing the way to a new, more specialized business model where IP, tools, services, and silicon foundry are brought together based on the specific needs of a given project.
This panel will address technical challenges presented by the systems-chip opportunity, as well as practical expectations and key missing pieces of industry infrastructure. Such pieces include: customer expectations, EDA technology, standards, legal barriers and associated risks facing ASIC suppliers and EDA vendors, challenges of incorporating 3rd-party IP, and practical reuse methodologies. In this forum, a noteworthy challenge — which has remained constant throughout the evolution to deep-submicron — is the ability for customers to develop their systems when faced with a widening gap between process technology and commercial EDA tools. As silicon suppliers continue to focus on providing the industry with advanced technologies and products, EDA suppliers must also increase their focus on providing tools and infrastructure. Another example challenge, faced by ASIC vendors, lies in supplying the high-level core models required for emerging cycle-based simulators, and supporting the use of cores in emulators and accelerators, when no standards for model creation or protection exist. Standard model interfaces such as those that exist for event-based simulators must be developed to support cycle simulators, emulators, accelerators as well as hardware-software co-verification tools. The panel will also examine essential factors to consider in the use of IP, based on their experience with the latest silicon process technologies. Finally, the ASIC vendor participants will showcase their joint efforts to build an effective IP infrastructure for the industry.
expand
|
|
|
Software synthesis of process-based concurrent programs |
| |
Bill Lin
|
|
Pages: 502-505 |
|
doi>10.1145/277044.277182 |
|
Full text: PDF
|
|
We present a Petri net theoretic approach to the software synthesis problem that can synthesize ordinary C programs from process-based concurrent specifications without the need for a run-time multi-threading environment. The synthesized C programs can ...
We present a Petri net theoretic approach to the software synthesis problem that can synthesize ordinary C programs from process-based concurrent specifications without the need for a run-time multi-threading environment. The synthesized C programs can be readily retargeted to different processors using available optimizing C compilers. Our compiler can also generate sequential Java programs as output, which can also be readily mapped to a target processor without the need for a multi-threading environment. Initial results demonstrate significant potentials for improvements over current run-time solutions.
expand
|
|
|
Don't care-based BDD minimization for embedded software |
| |
Youpyo Hong,
Peter A. Beerel,
Luciano Lavagno,
Ellen M. Sentovich
|
|
Pages: 506-509 |
|
doi>10.1145/277044.277183 |
|
Full text: PDF
|
|
This paper explores the use of don't cares in software synthesis for embedded systems. Embedded systems have extremely tight real-time and code/data size constraints, that make expensive optimizations desirable. We propose applying BDD minimization ...
This paper explores the use of don't cares in software synthesis for embedded systems. Embedded systems have extremely tight real-time and code/data size constraints, that make expensive optimizations desirable. We propose applying BDD minimization techniques in the presence of a don't care set to synthesize code for extended Finite State Machines from a BDD-based representation of the FSM transition function. The don't care set can be derived from local analysis (such as unused state codes or don't care inputs) as well as from external information (such as impossible input patterns). We show experimental results, discuss their implications, the interaction between BDD-based minimization and dynamic variable reordering, and propose directions for future work.
expand
|
|
|
Instruction selection, resource allocation, and scheduling in the AVIV retargetable code generator |
| |
Silvina Hanono,
Srinivas Devadas
|
|
Pages: 510-515 |
|
doi>10.1145/277044.277184 |
|
Full text: PDF
|
|
The AVIV retargetable code generator produces optimized machine code for target processors with different instruction set architectures AVIV optimizes for minimum code size.
Retargetable code generation requires the development of heuristic algorithms ...
The AVIV retargetable code generator produces optimized machine code for target processors with different instruction set architectures AVIV optimizes for minimum code size.
Retargetable code generation requires the development of heuristic algorithms for instruction selection, resource allocation, and scheduling. AVIV addresses these code generation subproblems concurrently, whereas most current code generation systems address them sequentially. It accomplishes this by converting the input application to a graphical (Split-Node DAG) representation that specifies all possible ways of implementing the application on the target processor. The information embedded in this representation in then used to set up a heuristic branch-and-bound step that performs functional unit assignment, operation grouping, register bank allocation, and scheduling concurrently. While detailed register allocation is carried out as a second step, estimates of register requirements are generated during the first step to ensure high quality of the final assembly code.
We show that near-optimal code can be generated for basic blocks for different architectures within reasonable amounts of CPU time. Our framework thus allows us to accurately evaluate the performance of different architectures on application code.
expand
|
|
|
Code compression for embedded systems |
| |
Haris Lekatsas,
Wayne Wolf
|
|
Pages: 516-521 |
|
doi>10.1145/277044.277185 |
|
Full text: PDF
|
|
Memory is one of the most restricted resources in many modern embedded systems. Code compression can provide substantial savings in terms of size. In a compressed code CPU, a cache miss triggers the decompression of a main memory block, before ...
Memory is one of the most restricted resources in many modern embedded systems. Code compression can provide substantial savings in terms of size. In a compressed code CPU, a cache miss triggers the decompression of a main memory block, before it gets transferred to the cache. Because the code must be decompressible starting from any point (or at least at cache block boundaries), most file-oriented compression techniques cannot be used. We propose two algorithms to compress code in a space-efficient and simple to decompress way, one which is independent of the instruction set and another which depends on the instruction set. We perform experiments on two instruction sets, a typical RISC (MIPS) and a typical CISC (x86) and compare our results to existing file-oriented compression algorithms.
expand
|
|
|
A decision procedure for bit-vector arithmetic |
| |
Clark W. Barrett,
David L. Dill,
Jeremy R. Levitt
|
|
Pages: 522-527 |
|
doi>10.1145/277044.277186 |
|
Full text: PDF
|
|
Bit-v ector theories with concatenation and extraction have been shown to be useful and important for hardware verification. We have implemented an extended theory which includes arithmetic. Although deciding equality in suc h a theory is NP-hard, our ...
Bit-v ector theories with concatenation and extraction have been shown to be useful and important for hardware verification. We have implemented an extended theory which includes arithmetic. Although deciding equality in suc h a theory is NP-hard, our implementation is efficient for many practical examples. We believ e this to be the first such implementation which is efficient, automatic, and complete.
expand
|
|
|
Functional vector generation for HDL models using linear programming and 3-satisfiability |
| |
Farzan Fallah,
Srinivas Devadas,
Kurt Keutzer
|
|
Pages: 528-533 |
|
doi>10.1145/277044.277187 |
|
Full text: PDF
|
|
Our strategy for automatic generation of functional vectors is based on exercising selected paths in the given hardware description language (HDL) model. The HDL model describes interconnections of arithmetic, logic and memory modules. Given a path in ...
Our strategy for automatic generation of functional vectors is based on exercising selected paths in the given hardware description language (HDL) model. The HDL model describes interconnections of arithmetic, logic and memory modules. Given a path in the HDL model, the search for input stimuli that exercise the path can be converted into a standard satisfiability checking problem by expanding the arithmetic modules into logic-gates. However, this approach is not very efficient.
We present a new HDL-satisfiability checking algorithm that works directly on the HDL model. The primary feature of our algorithm is a seamless integration of linear-programming techniques for feasibility checking of arithmetic equations that govern the behavior of datapath modules, and 3-SAT checking for logic equations that govern the behavior of control modules. This feature is critically important to efficiency, since it avoids module expansion and allows us to work with logic and arithmetic equations whose cardinality tracks the size of the HDL model.
We describe the details of the HDL-satisfiability checking algorithm in this paper. Experimental results which show significant speedups over state-of-the-art gate-level satisfiability checking methods are included.
expand
|
|
|
Automatic generation of assertions for formal verification of PowerPC microprocessor arrays using symbolic trajectory evaluation |
| |
Li-C. Wang,
Magdy S. Abadir,
Nari Krishnamurthy
|
|
Pages: 534-537 |
|
doi>10.1145/277044.277188 |
|
Full text: PDF
|
|
For verifying complex sequen tialbloc ks such as microprocessor embedded arrays, the formal method of symbolic trajectory ev aluation (STE) has achieved great success in the past [[3], [5], [6]]. P ast STE methodology for arrays requires manual creation ...
For verifying complex sequen tialbloc ks such as microprocessor embedded arrays, the formal method of symbolic trajectory ev aluation (STE) has achieved great success in the past [[3], [5], [6]]. P ast STE methodology for arrays requires manual creation of “assertions” to which both the RTL view and the actual design should be equivalent. In this paper, w e describe a novel method to automate the assertion creation process which improves the efficiency and the quality of array verification. Encouraging results on recent P owerPC arrays will be presented.
expand
|
|
|
Combining theorem proving and trajectory evaluation in an industrial environment |
| |
Mark D. Aagaard,
Robert B. Jones,
Carl-Johan H. Seger
|
|
Pages: 538-541 |
|
doi>10.1145/277044.277189 |
|
Full text: PDF
|
|
We describe the verification of the IM: a large, complex (12,000gates and 1100 latches) circuit that detects and marks the boundariesbetween Intel architecture (IA-32) instructions. We verified agate-level model of the IM against an implementation-independentspecification ...
We describe the verification of the IM: a large, complex (12,000gates and 1100 latches) circuit that detects and marks the boundariesbetween Intel architecture (IA-32) instructions. We verified agate-level model of the IM against an implementation-independentspecification of IA-32 instruction lengths. We used theorem provingto to derive 56 model-checking runs and to verify that the model-checkingruns imply that the IM meets the specification for all possiblesequences of IA-32 instructions. Our verification discoveredeight previously unknown bugs. expand
|
|
|
A fast and low cost testing technique for core-based system-on-chip |
| |
Indradeep Ghosh,
Sujit Dey,
Niraj K. Jha
|
|
Pages: 542-547 |
|
doi>10.1145/277044.277190 |
|
Full text: PDF
|
|
This paper proposes a new methodology for testing a core-based system-on-chip (SOC), targeting the simultaneous reduction of test area overhead and test application time. Testing of embedded cores is achieved using the transparency properties ...
This paper proposes a new methodology for testing a core-based system-on-chip (SOC), targeting the simultaneous reduction of test area overhead and test application time. Testing of embedded cores is achieved using the transparency properties of surrounding cores. At the core level, testability and transparency can be achieved by reusing existing logic inside the core, and providing different versions of the core having different area overheads and transparency latencies. At the chip level, the technique analyzes the topology of the SOC to select the core versions that best meet the user's desired test area overhead and test application time objectives. Application of the method to example SOCs demonstrates the ability to design highly testable SOCs with minimized test area overhead, minimized test application time, or a desired trade-off between the two. Significant reduction in area overhead and test application time compared to an existing SOC testing technique is also demonstrated.
expand
|
|
|
Introducing redundant computations in a behavior for reducing BIST resources |
| |
Ishwar Parulkar,
Sandeep K. Gupta,
Melvin A. Breuer
|
|
Pages: 548-553 |
|
doi>10.1145/277044.277191 |
|
Full text: PDF
|
|
The degree of freedom that can be exploited during scheduling and assignment to minimize BIST resources is often limited by the data dependencies of a behavior. W e propose transformation of a behavior by introducing redundant computations such that ...
The degree of freedom that can be exploited during scheduling and assignment to minimize BIST resources is often limited by the data dependencies of a behavior. W e propose transformation of a behavior by introducing redundant computations such that the resulting data path requires few BIST resources. The transformation makes use of spare capacity of modules to add redundancy that enables test paths to be shared among the modules. A technique is presented for introducing redundant computations that reduce the BIST resource requirements of a data path without compromising the latency and functional resource constraints.
expand
|
|
|
A BIST scheme for RTL controller-data paths based on symbolic testability analysis |
| |
Indradeep Ghosh,
Niraj K. Jha,
Sudipta Bhawmik
|
|
Pages: 554-559 |
|
doi>10.1145/277044.277192 |
|
Full text: PDF
|
|
This paper introduces a novel scheme for testing register-transfer level controller/data paths using built-in self-test (BIST). The scheme uses the controller netlist and the data path of a circuit to extract a test control/data flow (TCDF) which ...
This paper introduces a novel scheme for testing register-transfer level controller/data paths using built-in self-test (BIST). The scheme uses the controller netlist and the data path of a circuit to extract a test control/data flow (TCDF) which consists of operations mapped to modules in the circuit and variables mapped to registers. This TCDF is used to derive a set of symbolic justification and propagation paths (known as test environment) to test some of the operations and variables present in it. If it becomes difficult to generate such test environments with the derived TCDFs, a few test multiplexers are added at suitable points in the circuit to increase its controllability and observability. This test environment can then be used to exercise a module or register in the circuit with pseudorandom pattern generators which are placed only at the primary inputs of the circuit. The test responses can be analyzed with signature analyzers which are only placed at the primary outputs of the circuit. Every module in the module library is made random-pattern testable, whenever possible, using gate-level testability insertion techniques. Finally, a BIST controller is synthesized to provide the necessary control signals to form the different test environments during testing, and a BIST architecture is superimposed on the circuit. Experimental results on a number of industrial and university benchmarks show that high fault coverage (>99%) can be obtained with our scheme in a small number of test cycles at an average area (delay) overhead of only 6.4% (2.5%).
expand
|
|
|
Figures of merit to characterize the importance of on-chip inductance |
| |
Yehea I. Ismail,
Eby G. Friedman,
Jose L. Neves
|
|
Pages: 560-565 |
|
doi>10.1145/277044.277193 |
|
Full text: PDF
|
|
A closed form solution for the output signal of a CMOS inverter driving an RLC transmission line is presented. This solution is based on the alpha power law for deep submicrometer technologies. Two figures of merit are presented that ...
A closed form solution for the output signal of a CMOS inverter driving an RLC transmission line is presented. This solution is based on the alpha power law for deep submicrometer technologies. Two figures of merit are presented that are useful for determining if a section of interconnect should be modeled as either an RLC or an RC impedance. The damping factor of a lumped RLC circuit is shown to be a useful figure of merit. The second useful figure of merit considered in this paper is the ratio of the rise time of the input signal at the driver of an interconnect line to the time of flight of the signals across the line. AS/X circuit simulations of an RLC transmission line and a five section RC II circuit based on a 0.25 µm IBM CMOS technology are used to quantify and determine the relative accuracy of an RC model. One primary result of this study is evidence demonstrating that a range for the length of the interconnect exists for which inductance effects are prominent. Furthermore, it is shown that under certain conditions, inductance effects are negligible despite the length of the section of interconnect.
expand
|
|
|
Layout techniques for minimizing on-chip interconnect self inductance |
| |
Yehia Massoud,
Steve Majors,
Tareq Bustami,
Jacob White
|
|
Pages: 566-571 |
|
doi>10.1145/277044.277194 |
|
Full text: PDF
|
|
Because magnetic effects have a much longer spatial range than electrostatic effects, an interconnect line with large inductance will be sensitive to distant variations in interconnect topology. This long range sensitivity makes it difficult to balance ...
Because magnetic effects have a much longer spatial range than electrostatic effects, an interconnect line with large inductance will be sensitive to distant variations in interconnect topology. This long range sensitivity makes it difficult to balance delays in nets like clock trees, so for such nets inductance must be minimized. In this paper we use two- and three-dimensional electromagnetic field solvers to compare dedicated ground planes to a less area-consuming approach, interdigitating the signal line with ground lines. The surprising conclusion is that with very little area penalty, interdigitated ground lines are more effective at minimizing self-inductance than ground planes.
expand
|
|
|
A practical approach to static signal electromigration analysis |
| |
Nagaraj NS,
Frank Cano,
Haldun Haznedar,
Duane Young
|
|
Pages: 572-577 |
|
doi>10.1145/277044.277195 |
|
Full text: PDF
|
|
It is commonly thought that sweep-back effects would make electromigration (EM) a non-issue in signal lines. However this is only the case when the shape of the positive and negative current pulses are closely matched. Moreover, as performance pressures ...
It is commonly thought that sweep-back effects would make electromigration (EM) a non-issue in signal lines. However this is only the case when the shape of the positive and negative current pulses are closely matched. Moreover, as performance pressures increase, the peak current values are exceeding the range for which electromigration models are valid. Thus, during the design of TI's TMS320c6201 DSP chip, it was determined that limits needed to be placed on the current densities in signal-line segments, and that every net in the design should be checked. Dynamic current density analysis on all nets of a large design is computationally very expensive. In this paper, we describe a practical CAD methodology for a static, signal electromigration analysis for large cell-based designs. We present results and some observations from application of this methodology on the TMS320c6201.
expand
|
|
|
Design productivity (panel): how to measure it, how to improve it |
| |
Carlos Dangelo
|
|
Pages: 578-579 |
|
doi>10.1145/277044.277196 |
|
Full text: PDF
|
|
“Design and Test” inputs and outputs can be viewed as generic interfaces between product creators (system designers) and IC manufacturing. The interface is actually a set of engineering processes which transform the design inputs (specifications ...
“Design and Test” inputs and outputs can be viewed as generic interfaces between product creators (system designers) and IC manufacturing. The interface is actually a set of engineering processes which transform the design inputs (specifications and requirements) into a data set that drives IC manufacturing. There is a gap between advances in manuafcturing and advances in design. Hence, the emphasis on design productivity. Design productivity can be viewed as a measure of the efficiency of the resource (people, skills, methodologies, computers, tools, libraries, etc.) when applied to these processes to meet design input requirements. Some resources and processes are “hard” and can be readily quantified, e.g., numbers of computers, tools, and people. Other are more elusive, e.g., organization skill set, “quality” of specification code, methodologies, etc. How to measure and improve design productivity is the topic addressed by the panel.
expand
|
|
|
Hierarchical functional timing analysis |
| |
Yuji Kukimoto,
Robert K. Brayton
|
|
Pages: 580-585 |
|
doi>10.1145/277044.277197 |
|
Full text: PDF
|
|
We propose a hierarchical timing analysis technique for combinational circuits under the tightest known sensitization criterion, the XBDO delay model. Given a hierarchical combinational circuit, a generalized delay model of each left module is characterized ...
We propose a hierarchical timing analysis technique for combinational circuits under the tightest known sensitization criterion, the XBDO delay model. Given a hierarchical combinational circuit, a generalized delay model of each left module is characterized first. Since this timing characterization step takes into account false paths in each module, the delay model is more accurate than the one obtained by topological analysis. Then topological delay analysis is performed on the circuit composed of generalized gates replacing the leaf modules, where the “gate” delay model is the derived one. As far as the authors know, this is the first result that shows that hierarchical analysis is possible under state-of-the-art tight sensitization criteria. We demonstrate by experimental results that loss of accuracy in using the hierarchical approach is very minimal in practice. The theory developed in this paper also provides a foundation for incremental timing analysis under accurate sensitization criteria.
expand
|
|
|
Making complex timing relationships readable: Presburger formula simplicication using don't cares |
| |
Tod Amon,
Gaetano Borriello,
Jiwen Liu
|
|
Pages: 586-590 |
|
doi>10.1145/277044.277198 |
|
Full text: PDF
|
|
Solutions to timing relationship analysis problems are often reported using symbolic variables and inequalities which specify linear relationships between the variables. Complex relationships can be expressed using Presburger formulas which allow Boolean ...
Solutions to timing relationship analysis problems are often reported using symbolic variables and inequalities which specify linear relationships between the variables. Complex relationships can be expressed using Presburger formulas which allow Boolean relations to be specified between the inequalities. This paper develops and applies a highly effective simplification approach for Presburger formulas based on logic minimization techniques.
expand
|
|
|
Delay estimation VLSI circuits from a high-level view |
| |
Mahadevamurty Nemani,
Farid N. Najm
|
|
Pages: 591-594 |
|
doi>10.1145/277044.277199 |
|
Full text: PDF
|
|
Estimation of the delay of a Boolean function from its functional description is an important step towards design exploration at the register transfer level (RTL). This paper addresses the problem of estimating the delay of certain optimal ...
Estimation of the delay of a Boolean function from its functional description is an important step towards design exploration at the register transfer level (RTL). This paper addresses the problem of estimating the delay of certain optimal multi-level implementations of combinational circuits, given only their functional description. The proposed delay model uses a new complexity measure called the delay measure to estimate the delay. It has an advantage that it can be used to predict both, the minimum delay (associated with an optimum delay implementation) and the maximum delay (associated with an optimum area implementation) of a Boolean function without actually resorting to logic synthesis. The model is empirical and results demonstrating its feasibility and utility are presented.
expand
|
|
|
TETA: transistor-level engine for timing analysis |
| |
Florentin Dartu,
Lawrence T. Pileggi
|
|
Pages: 595-598 |
|
doi>10.1145/277044.277200 |
|
Full text: PDF
|
|
TETA is an interconnect-centric waveform calculator that was optimized to achieve the utmost efficiency for analyzing logic stages comprised of transistors and large coupled RC(L) interconnect models. TETA applies a novel compaction for the transistor ...
TETA is an interconnect-centric waveform calculator that was optimized to achieve the utmost efficiency for analyzing logic stages comprised of transistors and large coupled RC(L) interconnect models. TETA applies a novel compaction for the transistor clusters and employs successive chord iterations to solve the resulting nonlinear equations. These algorithms permit the use of simple SIMO (single input multi-output) N-port interconnect models since macromodel passivity is not required. The successive chord analysis also enables TETA to avoid the N-port matrix factorization during nonlinear iterations and allows the use of simple table look-up models for MOS devices. Complex gates and nonlinear capacitors can be handled without loss of generality.
expand
|
|
|
Validation with guided search of the state space |
| |
C. Han Yang,
David L. Dill
|
|
Pages: 599-604 |
|
doi>10.1145/277044.277201 |
|
Full text: PDF
|
|
In practice, model checkers are most useful when they find bugs, not when they prove a property. However, because large portions of the state space of the design actually satisfy the specification, model checkers devote much effort verifying correct ...
In practice, model checkers are most useful when they find bugs, not when they prove a property. However, because large portions of the state space of the design actually satisfy the specification, model checkers devote much effort verifying correct portions of the design. In this paper, we enhance the bug-finding capability of a model checker by using heuristics to search the states that are most likely to lead to an error, first. Reductions of 1 to 3 orders of magnitude in the number of states needed to find bugs in industrial designs have been observed. Consequently, these heuristics can extend the capability of model checkers to find bugs in designs.
expand
|
|
|
Efficient state classification of finite state Markov chains |
| |
Aiguo Xie,
Peter A. Beerel
|
|
Pages: 605-610 |
|
doi>10.1145/277044.277202 |
|
Full text: PDF
|
|
This paper presents an efficient method for state classification of finite state Markov chains using BDD-based symbolic techniques. The method exploits the fundamental properties of a Markov chain and classifies the state space by iteratively applying ...
This paper presents an efficient method for state classification of finite state Markov chains using BDD-based symbolic techniques. The method exploits the fundamental properties of a Markov chain and classifies the state space by iteratively applying reachability analysis. We compare our method with the current state-of-the-art technique which requires the computation of the transitive closure of the transition relation of a Markov chain. Experiments in over a dozen synchronous and asynchronous systems demonstrate that our method dramatically reduces the CPU time needed, and solves much larger problems because of reduced memory requiremen ts.
expand
|
|
|
An implicit algorithm for finding steady states and its application to FSM verification |
| |
Gagan Hasteer,
Anmol Mathur,
Prithviraj Banerjee
|
|
Pages: 611-614 |
|
doi>10.1145/277044.277203 |
|
Full text: PDF
|
|
Finding the set of steady states of a machine has applications in formal verification, sequential synthesis and ATPG. Existing techniques assume the presence of a designated set of initial states which is impractical in a real design environment. ...
Finding the set of steady states of a machine has applications in formal verification, sequential synthesis and ATPG. Existing techniques assume the presence of a designated set of initial states which is impractical in a real design environment. The set of steady state of a design is defined by the terminally strongly connected components (tSCCs) of the underlying state transition graph (STG). We show that multiple tSCCs and non-terminal SCCs need to be handled in a real design environment especially for verification. We present a fully implicit algorithm to find the steady states of a machine without any knowledge of initial states. We demonstrate the utility of our algorithm by applying it to FSM equivalence checking.
expand
|
|
|
Hybrid verification using saturated simulation |
| |
Adnan Aziz,
Jim Kukula,
Tom Shiple
|
|
Pages: 615-618 |
|
doi>10.1145/277044.277204 |
|
Full text: PDF
|
|
We develop a verification paradigm called saturated simulation, that is applicable to designs which can be decomposed into a set of interacting controllers. The core procedure is a symbolic algorithm that explores the space of controller ...
We develop a verification paradigm called saturated simulation, that is applicable to designs which can be decomposed into a set of interacting controllers. The core procedure is a symbolic algorithm that explores the space of controller interactions; heuristics for making this traversal efficient are described. Experiments demonstrate that our procedure explores substantially more of the controller interactions, and is more efficient than conventional symbolic reachability analysis.
expand
|
|
|
Fast state verification |
| |
Dechang Sun,
Bapiraju Vinnakota,
Wanli Jiang
|
|
Pages: 619-624 |
|
doi>10.1145/277044.277205 |
|
Full text: PDF
|
|
Unique input/output(UIO) sequences are used for state verification and functional test in finite state machines. A UIO sequence for a state s distinguishes it from other states in the FSM. Current algorithms to compute UIO sequences are limited in their ...
Unique input/output(UIO) sequences are used for state verification and functional test in finite state machines. A UIO sequence for a state s distinguishes it from other states in the FSM. Current algorithms to compute UIO sequences are limited in their applicability to FSMs with binary input symbols such as those found in con trol applications. Execution times of traditional approaches are exponential in the n umber of FSM inputs. We dev elop a new heuristic algorithm to generate UIO sequences for FSMs with binary inputs. Execution time is reduced significantly b y reducing the size of the search space. When a UIO sequence cannot be generated, our algorithm generates a small n umber of functional faults for state verification.
expand
|
|
|
A fast sequential learning technique for real circuits with application to enhancing ATPG performance |
| |
Aiman El-Maleh,
Mark Kassab,
Janusz Rajski
|
|
Pages: 625-631 |
|
doi>10.1145/277044.277206 |
|
Full text: PDF
|
|
This paper presents an efficient and novel method for sequential learning of implications, invalid states, and tied gates. It can handle real industrial circuits, with multiple clock domains and partial set/reset. The application of this method to improve ...
This paper presents an efficient and novel method for sequential learning of implications, invalid states, and tied gates. It can handle real industrial circuits, with multiple clock domains and partial set/reset. The application of this method to improve the efficiency of sequential ATPG is also demonstrated by achieving higher fault coverages and lower test generation times.
expand
|
|
|
Fault-simulation based design error diagnosis for sequential circuits |
| |
Shi-Yu Huang,
Kwang-Ting Cheng,
Kuang-Chien Chen,
Juin-Yeu Joseph Lu
|
|
Pages: 632-637 |
|
doi>10.1145/277044.277207 |
|
Full text: PDF
|
|
This paper addresses the problem of locating design errors in a sequential circuit. For single-error circuits, we consider a signal ƒ as a potential error source only if the circuit can be completely rectified by re-synthesizing ƒ (i.e., changing ...
This paper addresses the problem of locating design errors in a sequential circuit. For single-error circuits, we consider a signal ƒ as a potential error source only if the circuit can be completely rectified by re-synthesizing ƒ (i.e., changing the function of signal ƒ). In order to handle larger circuits, we do not rely on Binary Decision Diagram. Instead, we search for potential error sources by a modified sequential fault simulation process. The main contributions of this paper are two-fold: (1) we derive the necessary and sufficient condition of whether an erroneous input sequence (i.e., an input sequence producing erroneous responses) can be corrected by changing the function of a particular internal signal; and (2) we propose a modified fault simulation procedure to check this condition. Our approach does not rely on any error model, and thus, is suitable for general types of errors. Furthermore, it can be easily extended to identify multiple errors. Experimental results on ISCAS89 benchmark circuits are presented to demonstrate its capability.
expand
|
|
|
Functional verification of a multiple-issue, out-of-order, superscalar Alpha processor—the DEC Alpha 21264 microprocessor |
| |
Scott Taylor,
Michael Quinn,
Darren Brown,
Nathan Dohm,
Scot Hildebrandt,
James Huggins,
Carl Ramey
|
|
Pages: 638-643 |
|
doi>10.1145/277044.277208 |
|
Full text: PDF
|
|
DIGITAL's Alpha 21264 processor is a highly out-of-order, superpipelined, superscalar implementation of the Alpha architecture, capable of a peak execution rate of six instructions per cycle and a sustainable rate of four per cycle. The 21264 also features ...
DIGITAL's Alpha 21264 processor is a highly out-of-order, superpipelined, superscalar implementation of the Alpha architecture, capable of a peak execution rate of six instructions per cycle and a sustainable rate of four per cycle. The 21264 also features a 500 MHz clock speed and a high-bandwidth system interface that channels up to 5.3 Gbytes/second of cache data and 2.6 Gbytes/second of main-memory data into the processor. Simulation-based functional verification was performed on the logic design using implementation-directed, pseudo-random exercisers, supplemented with implementation-specific, hand-generated tests. Extensive functional coverage analysis was performed to grade and direct the verification effort. The success of the verification effort was underscored by first prototype chips which were used to boot multiple operating systems across several different prototype systems.
expand
|
|
|
Design reliability—estimation through statistical analysis of bug discovery data |
| |
Yossi Malka,
Avi Ziv
|
|
Pages: 644-649 |
|
doi>10.1145/277044.277209 |
|
Full text: PDF
|
|
Statistical analysis of bug discovery data is used in the software industry to check the quality of the testing process and estimate the reliability of the tested program. In this paper, we show that the same techniques are applicable to hardware design ...
Statistical analysis of bug discovery data is used in the software industry to check the quality of the testing process and estimate the reliability of the tested program. In this paper, we show that the same techniques are applicable to hardware design verification. We performed a study on two implementations of state-of-the-art PowerPC processors that shows that these techniques can provide quality information on the progress of verification and good predictions of the number of bugs left in the design and the future MTTF.
expand
|
|
|
Functional verification of large ASICs |
| |
Adrian Evans,
Allan Silburt,
Gary Vrckovnik,
Thane Brown,
Mario Dufresne,
Geoffrey Hall,
Tung Ho,
Ying Liu
|
|
Pages: 650-655 |
|
doi>10.1145/277044.277210 |
|
Full text: PDF
|
|
This paper describes the functional verification effort during a specific hardware development program that included three of the largest ASICs designed at Nortel. These devices marked a transition point in methodology as verification took front and ...
This paper describes the functional verification effort during a specific hardware development program that included three of the largest ASICs designed at Nortel. These devices marked a transition point in methodology as verification took front and centre on the critical path of the ASIC schedule. Both the simulation and emulation strategies are presented. The simulation methodology introduced new techniques such as ASIC sub-system level behavioural modeling, large multi-chip simulations, and random pattern simulations. The emulation strategy was based on a plan that consisted of integrating parts of the real software on the emulated system. This paper describes how these technologies were deployed, analyzes the bugs that were found and highlights the bottlenecks in functional verification as systems become more complex.
expand
|
|
|
The EDA start-up experience (panel): the first product |
| |
Erach Desai
|
|
Pages: 656-657 |
|
doi>10.1145/277044.277211 |
|
Full text: PDF
|
|
How does a novel EDA idea get transformed into a commercially successful product? Six veteran EDA entrepreneurs will discuss their experiences in bringing their companies' first products to market. Where did their ideas come from? How did they know their ...
How does a novel EDA idea get transformed into a commercially successful product? Six veteran EDA entrepreneurs will discuss their experiences in bringing their companies' first products to market. Where did their ideas come from? How did they know their ideas would meet real customer needs? And how many customers would there be? How are evolutionary and revolutionary products developed and marketed differently? When did the entrepreneurs stop developing and start shipping? Who were their competitors and their partners? Did they adopt industry standards or create new ones? How did they use advertising, DAC, and the WWW to promote their products? What are the relative merits of direct, VAR, and OEM selling?
expand
|
|
|
Digital system simulation: methodologies and examples |
| |
Kunle Olukotun,
Mark Heinrich,
David Ofelt
|
|
Pages: 658-663 |
|
doi>10.1145/277044.277212 |
|
Full text: PDF
|
|
Two major trends in the digital design industry are the increase insystem complexity and the increasing importance of short designtimes. The rise in design complexity is motivated by consumerdemand for higher performance products as well as increases ...
Two major trends in the digital design industry are the increase insystem complexity and the increasing importance of short designtimes. The rise in design complexity is motivated by consumerdemand for higher performance products as well as increases inintegration density which allow more functionality to be placed ona single chip. A consequence of this rise in complexity is a significantincrease in the amount of simulation required to design digitalsystems. Simulation time typically scales as the square of theincrease in system complexity [4]. Short design times are importantbecause once a design has been conceived there is a limited timewindow in which to bring the system to market while its performanceis competitive.Simulation serves many purposes during the design cycle of a digitalsystem. In the early stages of design, high-level simulation isused for performance prediction and analysis. In the middle of thedesign cycle, simulation is used to develop the software algorithmsand refine the hardware. In the later stages of design, simulation isused make sure performance targets are reached and to verify thecorrectness of the hardware and software. The different simulationobjectives require varying levels of modeling detail. To keep designtime to a minimum, it is critical to structure the simulation environmentto make it possible to trade-off simulation performance formodel detail in a flexible manner that allows concurrent hardwareand software development.In this paper we describe the different simulation methodologies fordeveloping complex digital systems, and give examples of one suchsimulation environment. The rest of this paper is organized as follows.In Section 2 we describe and classify the various simulationmethodologies that are used in digital system design and describehow they are used in the various stages of the design cycle. In Section3 we provide examples of the methodologies. We describe asophisticated simulation environment used to develop a large ASICfor the Stanford FLASH multiprocessor. expand
|
|
|
Hybrid techniques for fast functional simulation |
| |
Yufeng Luo,
Tjahjadi Wongsonegoro,
Adnan Aziz
|
|
Pages: 664-667 |
|
doi>10.1145/277044.277213 |
|
Full text: PDF
|
|
W e implement and experiment with techniques for the functional simulation of very large digital systems. We consider techniques that are a hybrid of classical compiled code simulation and recent branching program based simulation in order to resolve ...
W e implement and experiment with techniques for the functional simulation of very large digital systems. We consider techniques that are a hybrid of classical compiled code simulation and recent branching program based simulation in order to resolve memory performance problems inherent to BDD based cycle simulation. Specifically, predefined functional units (“macros”) are extracted from the circuit and evaluated directly instead of building BDDs for them. The functionality of those macros, such as multipliers, filters, etc., can in turn be verified by simulation of their gate-level implementations respectively or by formal verification techniques. Our results demonstrate that this approach leads to considerably faster simulation.
expand
|
|
|
A reconfigurable logic machine for fast event-driven simulation |
| |
Jerry Bauer,
Michael Bershteyn,
Ian Kaplan,
Paul Vyedin
|
|
Pages: 668-671 |
|
doi>10.1145/277044.277214 |
|
Full text: PDF
|
|
As the density of VLSI circuits increases, software techniques cannot effectively simulate designs through the millions of simulation cycles needed for verification. Emulation can supply the necessary capacity and performance, but emulation is limited ...
As the density of VLSI circuits increases, software techniques cannot effectively simulate designs through the millions of simulation cycles needed for verification. Emulation can supply the necessary capacity and performance, but emulation is limited to designs that are structural or can be synthesized. This paper discusses a new system architecture that dramatically accelerates event-driven behavioral simulation and describes how it is merged with emulation.
expand
|
|
|
Parallel algorithms for power estimation |
| |
Victor Kim,
Prithviraj Banerjee
|
|
Pages: 672-677 |
|
doi>10.1145/277044.277215 |
|
Full text: PDF
|
|
Sev eral tec hniques currently exist for estimating the pow er dissipation of combinational and sequen tialcircuits using exhaustive sim ulation,Monte Carlo sampling, and probabilistic estimation. Exhaustive sim ulation and Monte Carlo sampling techniques ...
Sev eral tec hniques currently exist for estimating the pow er dissipation of combinational and sequen tialcircuits using exhaustive sim ulation,Monte Carlo sampling, and probabilistic estimation. Exhaustive sim ulation and Monte Carlo sampling techniques can be highly reliable but often require long runtimes. This paper presents a comprehensive study of pattern-p artitioning and circuit-p artitioning parallelization schemes for those tw o methodologies in the con text of distributed-memory multiprocessing systems. Issues in pip eline dev ent-driv en simulation and dynamic load balancing are addressed. Experimental results are presented for an IBM SP-2 system and a netw ork of HP-9000 workstations. F or instance, runtimes have been reduced from over 3 hours to under 20 minutes in one case.
expand
|
|
|
A power macromodeling technique based on power sensitivity |
| |
Zhanping Chen,
Kaushik Roy
|
|
Pages: 678-683 |
|
doi>10.1145/277044.277216 |
|
Full text: PDF
|
|
In this paper, we propose a novel power macromodeling technique for high level power estimation based on power sensitivity. Power sensitivity defines the change in average power due to changes in the input signal specification. The contribution of this ...
In this paper, we propose a novel power macromodeling technique for high level power estimation based on power sensitivity. Power sensitivity defines the change in average power due to changes in the input signal specification. The contribution of this work is that we can use only a few points to construct a complicated power surface in the specification-space. With such a power surface, we can easily obtain the power dissipation under any distribution of primary inputs. The advantages of our technique are two-fold. First, the required parameters corresponding to each representative point can be efficiently obtained by only one symbolic power estimation run or by only one Monte Carlo based statistical power estimation process. This stems from the fact that power sensitivity can be obtained as a by-product of probabilistic or statistical power estimation runs. Second, the memory requirements for the macromodel are reduced to O(dn), where n is the number of primary inputs of a circuit and d is the number of representative points (d can be as small as 1 in some cases). Results on a number of benchmark circuits demonstrate the effectiveness of our technique.
expand
|
|
|
Maximum power estimation using the limiting distributions of extreme order statistics |
| |
Qinru Qiu,
Qing Wu,
Massoud Pedram
|
|
Pages: 684-689 |
|
doi>10.1145/277044.277217 |
|
Full text: PDF
|
|
In this paper we present a statistical method for estimating the maximum power consumption in VLSI circuits. The method is based on the theory of extreme order statistics applied to the probabilistic distribution of the cycle-based power consumption, ...
In this paper we present a statistical method for estimating the maximum power consumption in VLSI circuits. The method is based on the theory of extreme order statistics applied to the probabilistic distribution of the cycle-based power consumption, maximum likelihood estimation, and Monte-Carlo simulation. The method can predict the maximum power in the constrained space of given input vector pairs as well as the complete space of all possible input vector pairs. The simulation-based nature of the proposed method allows one to avoid the limitations imposed by simple gate-level delay models and handle arbitrary circuit structures. The proposed method can produce maximum power estimates to satisfy used-specified error and confidence levels. Experimental results show that this method provides maximum power estimates within 5% of the actual value and with a 90% confidence level by simulating, on average, about 2500 vector pairs.
expand
|
|
|
An optimization-based error calculation for statistical power estimation of CMOS logic circuits |
| |
Byunggyu Kwak,
Eun Sei Park
|
|
Pages: 690-693 |
|
doi>10.1145/277044.277218 |
|
Full text: PDF
|
|
In this paper, we present a statistical power estimation method where estimation time and accuracy can be balanced by assigning smaller errors to the nodes with higher power dissipation and higher errors to the nodes with lower power dissipation. To ...
In this paper, we present a statistical power estimation method where estimation time and accuracy can be balanced by assigning smaller errors to the nodes with higher power dissipation and higher errors to the nodes with lower power dissipation. To calculate the error rates for individual nodes, a quadratic programming based problem will be formulated which incorporates the distribution data of all individual node switching activities. Also, an iterative statistical power estimation system will be presented. Finally, we will demonstrate experimental results which show drastic reduction in the number of simulation patterns compared to previous methods.
expand
|
|
|
Using complementation and resequencing to minimize transitions |
| |
Rajeev Murgai,
Masahiro Fujita,
Arlindo Oliveira
|
|
Pages: 694-697 |
|
doi>10.1145/277044.277219 |
|
Full text: PDF
|
|
Recently, in [3], the following problem was addressed: Given a set of data words or messages to be transmitted over a bus such that the sequence (order) in which they are transmitted is irrelevant, determine the optimum sequence that minimizes ...
Recently, in [3], the following problem was addressed: Given a set of data words or messages to be transmitted over a bus such that the sequence (order) in which they are transmitted is irrelevant, determine the optimum sequence that minimizes the total number of transitions on the bus. In 1994, Stan and Burleson [5] presented the bus-invert method as a means of encoding words for reducing I/O power, in which a word may be inverted and then transmitted if doing so reduces the number of transitions. In this paper, we combine the two paradigms into one — that of sequencing words under the bus-invert scheme for the minimum transitions, i.e., words can be complemented, reordered and then transmitted. We prove that this problem DOPI — Data Ordering Problem with Inversion — is NP-complete. We present a polynomial-time approximation algorithm to solve DOPI that comes within a factor of 1.5 from the optimum. Experimental results show that, on average, the solutions generated by our algorithm were within 4.4% of the optimum, and that resequencing along with complementation leads to 34.4% reduction in switching activity.
expand
|
|
|
Technology mapping for large complex PLDs |
| |
Jason Helge Anderson,
Stephen Dean Brown
|
|
Pages: 698-703 |
|
doi>10.1145/277044.277220 |
|
Full text: PDF
|
|
In this paper we present a new technology mapping algorithm for use with complex PLDs (CPLDs), which consists of a large number of PLA-style logic blocks. Although the traditional synthesis approach for such devices uses two-level minimization, the complexity ...
In this paper we present a new technology mapping algorithm for use with complex PLDs (CPLDs), which consists of a large number of PLA-style logic blocks. Although the traditional synthesis approach for such devices uses two-level minimization, the complexity of recently-produced CPLDs has resulted in a trend toward multi-level synthesis. We describe an approach that allows existing multi-level synthesis techniques [13] to be adapted to produce circuits that are well-suited for implementation in CPLDs. Our algorithm produces circuits that require up to 90% fewer logic blocks than the circuits produced by a recently-published algorithm.
expand
|
|
|
Delay-optimal technology mapping for FPGAs with heterogeneous LUTs |
| |
Jason Cong,
Songjie Xu
|
|
Pages: 704-707 |
|
doi>10.1145/277044.277221 |
|
Full text: PDF
|
|
Recent generation of FPGAs take advantage of speed and density benefits resulted from heterogeneous FPGAs, which provide either an array of homogeneous programmable logic blocks (PLBs), each configured to implement circuits with lookup ...
Recent generation of FPGAs take advantage of speed and density benefits resulted from heterogeneous FPGAs, which provide either an array of homogeneous programmable logic blocks (PLBs), each configured to implement circuits with lookup tables (LUTs) of different sizes, or an array of physically heterogeneous LUTs. LUTs with different sizes usually have different delays. This paper presents the first polynomial-time optimal technology mapping algorithm, named HeteroMap, for delay minimization in heterogeneous FPGA designs. For a heterogeneous FPGA consisting of K1-LUTs, K2-LUTs, …, and Kc-LUTs, HeteroMap computes the minimum delay mapping solution in O (∑ci=1 Ki·n·m·log n) time for a circuit netlist with n gates and m edges. The HeteroMap algorithm generates favorable results for Xilinx XC4000 series FPGAs and Lucent ORCA2C series FPGAs. Furthermore, the optimality of the HeteroMap algorithm enables us to quantitatively evaluate various heterogeneous architectures without the bias of mapping heuristics.
expand
|
|
|
Exact tree-based FPGA technology mapping for logic blocks with independent LUTs |
| |
Madhukar R. Korupolu,
K. K. Lee,
D. F. Wong
|
|
Pages: 708-711 |
|
doi>10.1145/277044.277222 |
|
Full text: PDF
|
|
The logic blocks (CLBs) of a lookup table (LUT) based FPGA consist of one or more LUTs, possibly of different sizes. In this paper, we focus on technology mapping for CLBs with several independent LUTs of two different sizes (called ...
The logic blocks (CLBs) of a lookup table (LUT) based FPGA consist of one or more LUTs, possibly of different sizes. In this paper, we focus on technology mapping for CLBs with several independent LUTs of two different sizes (called ICLBs). The Actel ES6500 family is an example of a class of commercially available ICLBs. Given a tree network with n nodes, the only previously known approach for minimum area tree-based mapping to ICLBs was a heuristic with running time &THgr;(nd+1, where d is the maximum indegree of any node. We give an O(n3) time exact algorithm for mapping a given tree network, an improvement over this heuristic in terms of run time and the solution quality. For general networks, an effective strategy is to break it into trees and combine them. We also give an O(n3) exact algorithm for combining the optimal solutions to these trees, under the condition that LUTs do not go across trees. The method can be extended to solve mapping onto CLBs that can be configured into different ICLBs, (e.g. Xilinx' XC4000E).
expand
|
|
|
Compatible class encoding in hyper-function decomposition for FPGA synthesis |
| |
Jie-Hong R. Jiang,
Jing-Yang Jou,
Juinn-Dar Huang
|
|
Pages: 712-717 |
|
doi>10.1145/277044.277223 |
|
Full text: PDF
|
|
Recently, functional decomposition has been adopted for LUT based FPGA technology mapping with good results. In this paper, we propose a novel method for functional multiple-output decomposition. We first address a compatible class encoding method to ...
Recently, functional decomposition has been adopted for LUT based FPGA technology mapping with good results. In this paper, we propose a novel method for functional multiple-output decomposition. We first address a compatible class encoding method to minimize the compatible classes in the image function. After the encoding algorithm is applied, the decomposability will be improved in the subsequent decomposition of the image function. The above encoding algorithm is then extended to encode multiple-output functions through the construction of a hyper-function. Common sub-expressions among these multiple-output functions can be extracted during the decomposition of the hyper-function. Therefore, we can handle the multiple-output decomposition in the same manner as the single-output decomposition. Experimental results show that our algorithms are very promising.
expand
|
|
|
In-place power optimization for LUT-based FPGAs |
| |
Balakrishna Kumthekar,
Luca Benini,
Enrico Macii,
Fabio Somenzi
|
|
Pages: 718-721 |
|
doi>10.1145/277044.277224 |
|
Full text: PDF
|
|
This paper presents a new technique to perform power-oriented re-configuration of a system implemented using LUT FPGAs. The main features of our approach are: Accurate exploitation of degrees of freedom, concurrent optimization of multiple LUTs ...
This paper presents a new technique to perform power-oriented re-configuration of a system implemented using LUT FPGAs. The main features of our approach are: Accurate exploitation of degrees of freedom, concurrent optimization of multiple LUTs based on Boolean relations, and in-place re-programming without re-routing. Our tool optimizes the combinational component of the CLBs after layout, and does not require any re-wiring. Hence, delay and CLB usage are left unchanged, while power is minimized. As the algorithm operates locally on the various LUT clusters, it best performs on large examples as demonstrated by our experimental results: An average power reduction of 20.6% has been obtained on standard benchmarks.
expand
|
|
|
A re-engineering approach to low power FPGA design using SPFD |
| |
Jan-Min Hwang,
Feng-Yi Chiang,
TingTing Hwang
|
|
Pages: 722-725 |
|
doi>10.1145/277044.277225 |
|
Full text: PDF
|
|
In this paper, we present a method to re-synthesize Look-Up Table (LUT) based Field Programmable Gate Arrays (FPGAs) for low power design after technology mapping, placement and routing are performed. We use Set of Pairs of Functions to be Distinguished ...
In this paper, we present a method to re-synthesize Look-Up Table (LUT) based Field Programmable Gate Arrays (FPGAs) for low power design after technology mapping, placement and routing are performed. We use Set of Pairs of Functions to be Distinguished (SPFD) to express functional permissibility of each signal. Using different propagations of SPFD to fan-in signals, we change the functionality of a PLB (Programmable Logic Block) which drives large loading into one with low transition density. Experimental results show that our method can reduce on average 12% power consumption compared to the original circuits without affecting placement and routing.
expand
|
|
|
Power considerations in the design of the Alpha 21264 microprocessor |
| |
Michael K. Gowan,
Larry L. Biro,
Daniel B. Jackson
|
|
Pages: 726-731 |
|
doi>10.1145/277044.277226 |
|
Full text: PDF
|
|
Power dissipation is rapidly becoming a limiting factor in high performance microprocessor design due to ever increasing device counts and clock rates. The 21264 is a third generation Alpha microprocessor implementation, containing 15.2 million transistors ...
Power dissipation is rapidly becoming a limiting factor in high performance microprocessor design due to ever increasing device counts and clock rates. The 21264 is a third generation Alpha microprocessor implementation, containing 15.2 million transistors and operating at 600 MHz. This paper describes some of the techniques the Alpha design team utilized to help manage power dissipation. In addition, the electrical design of the power, ground, and clock networks is presented.
expand
|
|
|
Reducing power in high-performance microprocessors |
| |
Vivek Tiwari,
Deo Singh,
Suresh Rajgopal,
Gaurav Mehta,
Rakesh Patel,
Franklin Baez
|
|
Pages: 732-737 |
|
doi>10.1145/277044.277227 |
|
Full text: PDF
|
|
Power consumption has become one of the biggest challenges in high-performance microprocessor design. The rapid increase in the complexity and speed of each new CPU generation is outstripping the benefits of voltage reduction and feature size scaling. ...
Power consumption has become one of the biggest challenges in high-performance microprocessor design. The rapid increase in the complexity and speed of each new CPU generation is outstripping the benefits of voltage reduction and feature size scaling. Designers are thus continuously challenged to come up with innovative ways to reduce power, while trying to meet all the other constraints imposed on the design. This paper presents an overview of the issues related to power consumption in the context of Intel CPUs. The main trends that are driving the increased focus on design for low power are described. System and benchmarking issues, and sources of power consumption in a high-performance CPU are briefly described. Techniques that have been tried on real designs in the past are described. The role of CAD tools and their limitations in this domain will also be discussed. In addition, areas that need increased research focus in the future are also pointed out.
expand
|
|
|
Design and analysis of power distribution networks in PowerPC microprocessors |
| |
Abhijit Dharchoudhury,
Rajendran Panda,
David Blaauw,
Ravi Vaidyanathan,
Bogdan Tutuianu,
David Bearden
|
|
Pages: 738-743 |
|
doi>10.1145/277044.277229 |
|
Full text: PDF
|
|
We present a methodology for the design and analysis of power grids in the PowerPC™ microprocessors. The methodology covers the need for power grid analysis across all stages of the design process. A case study showing the application of this methodology ...
We present a methodology for the design and analysis of power grids in the PowerPC™ microprocessors. The methodology covers the need for power grid analysis across all stages of the design process. A case study showing the application of this methodology to the PowerPC™ 750 microprocessor is presented.
expand
|
|
|
Full-chip verification methods for DSM power distribution systems |
| |
Gregory Steele,
David Overhauser,
Steffen Rochel,
Syed Zakir Hussain
|
|
Pages: 744-749 |
|
doi>10.1145/277044.277231 |
|
Full text: PDF
|
|
Power distribution verification is rapidly becoming a necessary step in deep submicron (DSM) design of high performance integrated circuits. With the increased load and reduced tolerances of DSM circuits, more failures are being seen due to poorly designed ...
Power distribution verification is rapidly becoming a necessary step in deep submicron (DSM) design of high performance integrated circuits. With the increased load and reduced tolerances of DSM circuits, more failures are being seen due to poorly designed power distribution systems. This paper describes an efficient approach for the verification of power distribution at the full-chip transistor level based on a combination of hierarchical static and dynamic techniques. Application of the methodology on practical design examples will be provided. We will also demonstrate the necessity of an analysis at the full-chip transistor level to verify the complex interactions between different design blocks based on static and dynamic effects.
expand
|
|
|
System chip test challenges, are there solutions today? (panel) |
| |
Prab Varma
|
|
Pages: 750-751 |
|
doi>10.1145/277044.277232 |
|
Full text: PDF
|
|
To meet the challenges involved in designing systems on silicon, IC designers are increasingly adopting a design re-use methodology, in which pre-designed logic modules, often called virtual components (VC) or intellectual property (IP) cores, are integrated ...
To meet the challenges involved in designing systems on silicon, IC designers are increasingly adopting a design re-use methodology, in which pre-designed logic modules, often called virtual components (VC) or intellectual property (IP) cores, are integrated together to construct system chips. Testing such increasingly complex systems comprising components that may be “black-boxes” is a major problem. The panel will discuss the challenges associated with testing system chips containing a diverse range of pre-designed VCs and will address the question of whether there are viable solutions today. It will discuss the impact of design re-use on test methodologies and will discuss whether the time to market gains offered by design re-use will be realized without test re-use. It will examine the role of scan, BIST, at-speed testing etc. as virtual component test techniques, and will discuss the issues involved in providing test access to facilitate test re-use. It will debate the impact of the SIA road-map on test requirements and will discuss the limitations in ATE that will have to be overcome and the issues in performance testing that will have to be addressed to test tomorrow's system chips. In addition, it will look at the role of standards (such as those being proposed by VSIA and IEEE) and discuss whether standardization is possible.
not accessible from the primary I/Os is also required.
expand
|
|
|
System-chip test strategies |
| |
Yervant Zorian
|
|
Pages: 752-757 |
|
doi>10.1145/277044.277234 |
|
Full text: PDF
|
|
A major challenge in realizing core-based system-chips is theadoption of adequate test and diagnosis strategies. This paperfocuses on the current industrial practices in test strategiesfor system-chips. It discusses the challenges in testingembedded ...
A major challenge in realizing core-based system-chips is theadoption of adequate test and diagnosis strategies. This paperfocuses on the current industrial practices in test strategiesfor system-chips. It discusses the challenges in testingembedded cores, the testing requirements for individualcores, and their test access mechanisms. It also covers theintegrated test strategies for system-chips based on reusablecores. In addition to the state-of-the-art practices intestability schemes, this paper covers the currentstandardization efforts for embedded core test interfacemechanisms. expand
|
|
|
Finite state machine decomposition for low power |
| |
José C. Monteiro,
Arlindo L. Oliveira
|
|
Pages: 758-763 |
|
doi>10.1145/277044.277235 |
|
Full text: PDF
|
|
Clock-gating techniques have been shown to be very effective in the reduction of the switching activity in sequential logic circuits. In this paper we describe a new clock-gating technique based on finite state machine (FSM) decomposition. We compute ...
Clock-gating techniques have been shown to be very effective in the reduction of the switching activity in sequential logic circuits. In this paper we describe a new clock-gating technique based on finite state machine (FSM) decomposition. We compute two sub-FSMs that together have the same functionality as the original FSM. For all the transitions within one sub-FSM, the clock for the other sub-FSM is disabled. To minimize the average switching activity, we search for a small cluster of states with high stationary state probability and use it to create the small sub-FSM. This way we will have a small amount of logic that is active most of the time, during which is disabling a much larger circuit, the other sub-FSM.
We provide a set of experimental results that show that power consumption can be substantially reduced, in some cases up to 80%.
expand
|
|
|
Computational kernels and their application to sequential power optimization |
| |
L. Benini,
G. De Micheli,
A. Lioy,
E. Macii,
G. Odasso,
M. Poncino
|
|
Pages: 764-769 |
|
doi>10.1145/277044.277237 |
|
Full text: PDF
|
|
We introduce a new sequential optimization paradigm based on the extraction of computational kernels, i.e., logic blocks whose behavior mimics the steady-state behavior of the original circuit. We present a procedure for the automatic extraction ...
We introduce a new sequential optimization paradigm based on the extraction of computational kernels, i.e., logic blocks whose behavior mimics the steady-state behavior of the original circuit. We present a procedure for the automatic extraction of such kernels directly from the gate-level description of the design. The advantage of this solution with respect to extraction algorithms based on STG analysis is that it can be applied to large circuits, since it does not require to manipulate the STG specification.
We exploit computational kernels for optimization purposes; in particular, we describe an architectural decomposition paradigm whose template is reminiscent of the mux-based scheme adopted in parallel implementations of logic-level descriptions.
We show the usefulness of the new optimization style by applying it to the problem of reducing the power dissipated by a sequential circuit. Experimental results, obtained on standard benchmarks, demonstrate the merit of the proposed approach.
expand
|
|
|
Partitioning and optimizing controllers synthesized from hierarchical high-level descriptions |
| |
Andrew Seawright,
Wolfgang Meyer
|
|
Pages: 770-775 |
|
doi>10.1145/277044.277239 |
|
Full text: PDF
|
|
This paper describes methods for partitioning and optimizing controllers described by hierarchical high-level descriptions. The methods utilize the structure of the high-level description, provide flexible exploration of the trade-off between ...
This paper describes methods for partitioning and optimizing controllers described by hierarchical high-level descriptions. The methods utilize the structure of the high-level description, provide flexible exploration of the trade-off between combinational logic and registers to reduce implementation cost, and allow the designer to control the synthesis process. Results are presented using industrial examples.
expand
|
|
|
Watermarking techniques for intellectual property protection |
| |
A. B. Kahng,
J. Lach,
W. H. Mangione-Smith,
S. Mantik,
I. L. Markov,
M. Potkonjak,
P. Tucker,
H. Wang,
G. Wolfe
|
|
Pages: 776-781 |
|
doi>10.1145/277044.277240 |
|
Full text: PDF
|
|
Digital system designs are the product of valuable effort and know-how. Their embodiments, from software and HDL program down to device-level netlist and mask data, represent carefully guarded intellectual property (IP). Hence, design methodologies based ...
Digital system designs are the product of valuable effort and know-how. Their embodiments, from software and HDL program down to device-level netlist and mask data, represent carefully guarded intellectual property (IP). Hence, design methodologies based on IP reuse require new mechanisms to protect the rights of IP producers and owners. This paper establishes principles of watermarking-based IP protection, where a watermark is a mechanism for identification that is (i) nearly invisible to human and machine inspection, (ii) difficult to remove, and (iii) permanently embedded as an integral part of the design. We survey related work in cryptography and design methodology, then develop desiderata, metrics and example approaches — centering on constraint-based techniques — for watermarking at various stages of the VLSI design process.
expand
|
|
|
Robust IP watermarking methodologies for physical design |
| |
Andrew B. Kahng,
Stefanus Mantik,
Igor L. Markov,
Miodrag Potkonjak,
Paul Tucker,
Huijuan Wang,
Gregory Wolfe
|
|
Pages: 782-787 |
|
doi>10.1145/277044.277241 |
|
Full text: PDF
|
|
Increasingly popular reuse-based design paradigms create a pressing need for authorship enforcement techniques that protect the intellectual property rights of designers. We develop the first intellectual property protection protocols for embedding design ...
Increasingly popular reuse-based design paradigms create a pressing need for authorship enforcement techniques that protect the intellectual property rights of designers. We develop the first intellectual property protection protocols for embedding design watermarks at the physical design level. We demonstrate that these protocols are tarnsparent with respect to existing industrial tools and design flows, and that they can embed watermarks into real-world industrial designs with very low implementation overhead (as measured by such standard metrics as wirelength, layout area, number of vias, routing congestion and CPU time). On several industrial test cases, we obtain extremely strong, tamper-resistant proofs of authorship for placement and routing solutions. expand
|
|
|
Data security for Web-based CAD |
| |
Scott Hauck,
Stephen Knol
|
|
Pages: 788-793 |
|
doi>10.1145/277044.277242 |
|
Full text: PDF
|
|
Internet-based computing has significant potential for improving most high-performance computing, including VLSI CAD. In this paper we consider the ramifications of the Internet on electronics design, and develop two models for Web-based CAD. We also ...
Internet-based computing has significant potential for improving most high-performance computing, including VLSI CAD. In this paper we consider the ramifications of the Internet on electronics design, and develop two models for Web-based CAD. We also investigate the security of these systems, and propose methods for protection against threats both from unrelated users, as well as from the CAD tools and tool developers themselves. These techniques provide methods for hiding unnecessary information. Such techniques will be key to the development of future Internet-based CAD applications, since serious CAD users will be unwilling to use any CAD methodology that risks exposing their designs to outsiders. By enabling Web-based CAD, these techniques can improve CAD performance, enable collaboratory design, and create a usage-based pricing methodology.
expand
|
|
|
Design of a SPDIF receiver using protocol compiler |
| |
Ulrich Holtmann,
Peter Blinzer
|
|
Pages: 794-799 |
|
doi>10.1145/277044.277243 |
|
Full text: PDF
|
|
This paper describes the design of a receiver for the digital audio signal SPDIF used by CD-ROM players. The design was done with Protocol Compiler, a high-level synthesis tool for the design of structured data stream processing controllers.
Compared ...
This paper describes the design of a receiver for the digital audio signal SPDIF used by CD-ROM players. The design was done with Protocol Compiler, a high-level synthesis tool for the design of structured data stream processing controllers.
Compared to traditional RTL design, Protocol Compiler makes entry, debugging, and re-use easier. Design time was cut by factor 2 while the results in terms of area and delay are competitive.
expand
|
|
|
MetaCore: an application specific DSP development system |
| |
Jin-Hyuk Yang,
Byoung-Woon Kim,
Sang-Jun Nam,
Jang-Ho Cho,
Sung-won Seo,
Chang-Ho Ryu,
Young-Su Kwon,
Dae-Hyun Lee,
Jong-Yeol Lee,
Jong-Sun Kim,
Hyun-Dhong Yoon,
Jae-Yeol Kim,
Kun-Moo Lee,
Chan-Soo Hwang,
In-Hyung Kim,
Jun-Sung Kim,
Kwang-11 Park,
Kyu-Ho Park,
Yong-Hoon Lee,
Seung-Hoon Hwang,
In-Cheol Park,
Chong-Min Kyung
|
|
Pages: 800-803 |
|
doi>10.1145/277044.277247 |
|
Full text: PDF
|
|
This paper describes the MetaCore system which is an ASIP (Application-Specific Instruction set Processor) development system targeted for DSP applications. The goal of MetaCore system is to offer an efficient design methodology meeting specifications ...
This paper describes the MetaCore system which is an ASIP (Application-Specific Instruction set Processor) development system targeted for DSP applications. The goal of MetaCore system is to offer an efficient design methodology meeting specifications given as a combination of performance, cost and design turnaround time.
MetaCore system consists of two major design stages: design exploration and design generation. In the design exploration stage, MetaCore system accepts a set of benchmark programs and a formal specification of ISA (Instruction Set Architecture), and estimates the hardware cost and performance for each hardware configuration being explored. Once a hardware configuration is chosen, the system helps generate a VLSI processor design in the form of HDL along with the application program development tools such as C compiler, assembler and instruction set simulator.
expand
|
|
|
A case study in embedded system design: an engine control unit |
| |
Tullio Cuatto,
Claudio Passeronge,
Luciano Lavagno,
Attila Jurecska,
Antonino Damiano,
Claudio Sansoè,
A. Sangiovanni-Vincentelli,
Alberto Sangiovanni-Vincentelli
|
|
Pages: 804-807 |
|
doi>10.1145/277044.277248 |
|
Full text: PDF
|
|
A number of techniques and software tools for embedded system design have been recently proposed. However, the current practice in the designer community is heavily based on manual techniques and on past experience rather than on a rigorous approach ...
A number of techniques and software tools for embedded system design have been recently proposed. However, the current practice in the designer community is heavily based on manual techniques and on past experience rather than on a rigorous approach to design. To advance the state of the art it is important to address a number of relevant design problems and solve them to demonstrate the power of the new approaches.
We chose an industrial example in automotive electronics to validate our design methodology: an existing commercially available Engine Control Unit. We discuss in detail the specification, the implementation philosophy, and the architectural trade-off analysis. We analyze the results obtained with our approach and compare them with the existing design underlining the advantages offered by a systematic approach to embedded system design in terms of performance and design time.
expand
|
|
|
HW/SW coverification performance estimation and benchmark for a 24 embedded RISC core design |
| |
Thomas W. Albrecht,
Johann Notbauer,
Stefan Rohringer
|
|
Pages: 808-811 |
|
doi>10.1145/277044.277250 |
|
Full text: PDF
|
|
This paper describes the benchmarking of a HW/SW-coverification design strategy. The benchmark results were the base for making a principal verification decision for an already ongoing project at Siemens AG, Public Communication Network Group. The intention ...
This paper describes the benchmarking of a HW/SW-coverification design strategy. The benchmark results were the base for making a principal verification decision for an already ongoing project at Siemens AG, Public Communication Network Group. The intention for this benchmark was to verify whether commercial available coverification tools can handle the design complexity of an embedded system containing 24 embedded RISC cores and provides the necessary performance in terms of simulation speed and throughput.
expand
|
|
|
System-level exploration with SpecSyn |
| |
Daniel D. Gajski,
Frank Vahid,
Sanjiv Narayan,
Jie Gong
|
|
Pages: 812-817 |
|
doi>10.1145/277044.277252 |
|
Full text: PDF
|
|
We present the SpecSyn system-level design environment supp orting the sp ecify-explor e-refine (SER) designparadigm. This thr ee-step appr oach includes precise specification of system functionality, rapid explor ation of numerous system-level ...
We present the SpecSyn system-level design environment supp orting the sp ecify-explor e-refine (SER) designparadigm. This thr ee-step appr oach includes precise specification of system functionality, rapid explor ation of numerous system-level design options, and refinement of the specification into one r efle cting the chosen option. A system-level design option c onsists of an allocation of system components like standard and custom processors, and a partitioning of functionality among those components. Focusing on SpecSyn 'sexploration te chniques, we emphasize its two-phase estimation appr oach and highlight experiments using SpecSyn.
expand
|
|
|
Author Index |
|
Page: 818 |
|
|
|
|
Presentations from the 35th DAC: 35 years of design automation |
| |
Massoud Pedram
|
|
Page: 819 |
|
|