Author image not provided
 Ramaswamy Govindarajan

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article5.06
Citation Count359
Publication count71
Publication years1986-2017
Available for download24
Average downloads per article325.38
Downloads (cumulative)7,809
Downloads (12 Months)756
Downloads (6 Weeks)376
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


71 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 71
Result page: 1 2 3 4

Sort by:

1 published by ACM
December 2017 ACM Transactions on Architecture and Code Optimization (TACO): Volume 14 Issue 4, December 2017
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 346,   Downloads (12 Months): 346,   Downloads (Overall): 346

Full text available: PDFPDF
Integrated Heterogeneous System (IHS) processors pack throughput-oriented General-Purpose Graphics Pprocessing Units (GPGPUs) alongside latency-oriented Central Processing Units (CPUs) on the same die sharing certain resources, e.g., shared last-level cache, Network-on-Chip (NoC), and the main memory. The demands for memory accesses and other shared resources from GPU cores can exceed that ...
Keywords: 3D-stacked memory, Integrated CPU-GPU processors, cache sharing, DRAM cache

2 published by ACM
October 2016 MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 4,   Downloads (12 Months): 58,   Downloads (Overall): 81

Full text available: PDFPDF
DRAM memory systems require periodic recharging to avoid loss of data from leaky capacitors. These refresh operations consume energy and reduce the duration of time for which the DRAM banks are available to service memory requests. Higher DRAM density and 3D-stacking aggravate the refresh overheads, incurring even higher energy and ...

3
October 2015 Journal of Signal Processing Systems: Volume 81 Issue 1, October 2015
Publisher: Kluwer Academic Publishers
Bibliometrics:
Citation Count: 0


4 published by ACM
January 2015 ICPE '15: Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 2,   Downloads (12 Months): 11,   Downloads (Overall): 154

Full text available: PDFPDF
Stacked DRAM promises to offer unprecedented capacity, and bandwidth to multi-core processors at moderately lower latency than off-chip DRAMs. A typical use of this abundant DRAM is as a large last level cache. Prior research works are divided on how to organize this cache and the proposed organizations fall into ...
Keywords: performance model, dram cache, memory system, multi-core architecture, analytical model

5
December 2014 MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 2,   Downloads (12 Months): 20,   Downloads (Overall): 181

Full text available: PDFPDF
In this paper, we present Bi-Modal Cache - a flexible stacked DRAM cache organization which simultaneously achieves several objectives: (i) improved cache hit ratio, (ii) moving the tag storage overhead to DRAM, (iii) lower cache hit latency than tags-in-SRAM, and (iv) reduction in off-chip bandwidth wastage. The Bi-Modal Cache addresses ...

6 published by ACM
June 2014 SIGMETRICS '14: The 2014 ACM international conference on Measurement and modeling of computer systems
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 8,   Downloads (12 Months): 65,   Downloads (Overall): 592

Full text available: PDFPDF
Memory system design is increasingly influencing modern multi-core architectures from both performance and power perspectives. However predicting the performance of memory systems is complex, compounded by the myriad design choices and parameters along multiple dimensions, namely (i) technology, (ii) design and (iii) architectural choices. In this work, we construct an ...
Keywords: dram, memory system performance, analytical model
Also published in:
June 2014  ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review: Volume 42 Issue 1, June 2014

7 published by ACM
February 2014 CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Publisher: ACM
Bibliometrics:
Citation Count: 15
Downloads (6 Weeks): 3,   Downloads (12 Months): 104,   Downloads (Overall): 723

Full text available: PDFPDF
Programming heterogeneous computing systems with Graphics Processing Units (GPU) and multi-core CPUs in them is complex and time-consuming. OpenCL has emerged as an attractive programming framework for heterogeneous systems. But utilizing multiple devices in OpenCL is a challenge because it requires the programmer to explicitly map data and computation to ...
Keywords: GPGPU, Runtime, FluidiCL, Heterogeneous Devices, OpenCL

8
February 2013 Journal of Signal Processing Systems: Volume 70 Issue 2, February 2013
Publisher: Kluwer Academic Publishers
Bibliometrics:
Citation Count: 0

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We ...
Keywords: Speech recognition, Dynamic time warping, Acoustic likelihood computations, Euclidean distance matrix computation, Low-rank matrix approximation

9
September 2010 SAS'10: Proceedings of the 17th international conference on Static analysis
Publisher: Springer-Verlag
Bibliometrics:
Citation Count: 5

We propose a novel formulation of the points-to analysis as a system of linear equations. With this, the efficiency of the points-to analysis can be significantly improved by leveraging the advances in solution procedures for solving the systems of linear equations. However, such a formulation is non-trivial and becomes challenging ...

10
August 2009 IEEE Transactions on Computers: Volume 58 Issue 8, August 2009
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. With the requirement to process packets at line rates, high-performance routers need to forward millions of packets every second with each packet needing up to seven memory accesses. Earlier work shows that a single cache for the ...
Keywords: Special-purpose and application-based systems, design, performance, experimentation, cache architectures, network processors, synthetic trace generation, trace driven simulation., cache architectures, network processors, performance, synthetic trace generation, design, trace driven simulation., experimentation, Special-purpose and application-based systems

11 published by ACM
June 2009 LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Publisher: ACM
Bibliometrics:
Citation Count: 11
Downloads (6 Weeks): 0,   Downloads (12 Months): 10,   Downloads (Overall): 443

Full text available: PDFPDF
The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, ...
Keywords: CUDAa, GPU programming, software pipelining, stream programming, partitioning
Also published in:
June 2009  ACM SIGPLAN Notices - LCTES '09: Volume 44 Issue 7, July 2009

12
March 2009 CGO '09: Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 33
Downloads (6 Weeks): 2,   Downloads (12 Months): 18,   Downloads (Overall): 561

Full text available: PDFPDF
The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe ...
Keywords: CUDA, GPU Programming, Software Pipelining, Stream Programming

13 published by ACM
June 2008 ICS '08: Proceedings of the 22nd annual international conference on Supercomputing
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 2,   Downloads (12 Months): 12,   Downloads (Overall): 240

Full text available: PDFPDF
Loads that miss in L1 or L2 caches and waiting for their data at the head of the ROB cause significant slow down in the form of commit stalls. We identify that most of these commit stalls are caused by a small set of loads, referred to as LIMCOS (Loads ...
Keywords: commit stalls, prefetch

14 published by ACM
April 2008 CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 0,   Downloads (12 Months): 11,   Downloads (Overall): 470

Full text available: PDFPDF
Data-flow analysis is an integral part of any aggressive optimizing compiler. We propose a framework for improving the precision of data-flow analysis in the presence of complex control-flow. We initially perform data-flow analysis to determine those control-flow merges which cause the loss in data-flow analysis precision. The control-flow graph of ...
Keywords: data-flow analysis, code duplication, path-sensitive, restructuring, split graph, destructive merge, precision

15
January 2008 VLSID '08: Proceedings of the 21st International Conference on VLSI Design
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the em- bedded system strongly influences crtical system design objectives like area, power and performance. Hence the embedded system designer performs a complete memory ...

16
January 2008 Parallel Computing: Volume 34 Issue 1, January, 2008
Publisher: Elsevier Science Publishers B. V.
Bibliometrics:
Citation Count: 2

In this paper, we study the scalability of an atmospheric modeling application on a cluster with commercially available off-the-shelf interconnects. It is found that interconnects with large latency and low bandwidth are major bottlenecks for performance scalability. Response curves for latency shows that for large message sizes latency is extremely ...
Keywords: Cluster computing, Communication layer, Message compression, Parallel application, Atmospheric modeling

17
September 2007 QEST '07: Proceedings of the Fourth International Conference on Quantitative Evaluation of Systems
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

Previous studies have shown that buffering packets in DRAM is a performance bottleneck. In order to understand the impediments in accessing the DRAM, we developed a detailed Petri net model of IP forwarding application on IXP2400 that models the different levels of the memory hierarchy. The cell based interface used ...

18
September 2007 PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1,   Downloads (12 Months): 1,   Downloads (Overall): 74

Full text available: PDFPDF
Out-of-order superscalar processors require the ability to issue loads while older stores are in-flight. Forcing loads to wait for all older stores, including those on which they may not be dependent on, to retire and write to the cache would reduce IPC and take away almost all the benefit of ...

19
September 2007 PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1,   Downloads (12 Months): 2,   Downloads (Overall): 106

Full text available: PDFPDF
Due to the tight coupling between processor cycle time and L1 access time, L1 caches are typically small and have low associativities. As a consequence they incur a higher percentage of conflict misses than lower level caches. The extent of conflict depends on the memory access pattern exhibited by the ...

20 published by ACM
March 2007 ACM Transactions on Architecture and Code Optimization (TACO): Volume 4 Issue 1, March 2007
Publisher: ACM
Bibliometrics:
Citation Count: 8
Downloads (6 Weeks): 1,   Downloads (12 Months): 12,   Downloads (Overall): 859

Full text available: PDFPDF
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP) , to software pipeline a loop nest at an arbitrary loop level that has a rectangular ...
Keywords: modulo scheduling, Software pipelining, loop transformation



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us