Author image not provided
 José María Llabería

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article4.12
Citation Count202
Publication count49
Publication years1983-2013
Available for download16
Average downloads per article404.38
Downloads (cumulative)6,470
Downloads (12 Months)142
Downloads (6 Weeks)15
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


48 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 48
Result page: 1 2 3

Sort by:

1 published by ACM
December 2013 MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 9
Downloads (6 Weeks): 5,   Downloads (12 Months): 43,   Downloads (Overall): 1,181

Full text available: PDFPDF
Over recent years, a growing body of research has shown that a considerable portion of the shared last-level cache (SLLC) is dead, meaning that the corresponding cache lines are stored but they will not receive any further hits before being replaced. Conversely, most hits observed by the SLLC come from ...
Keywords: last-level cache organization, reuse

2 published by ACM
January 2013 ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers: Volume 9 Issue 4, January 2013
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 5,   Downloads (12 Months): 23,   Downloads (Overall): 512

Full text available: PDFPDF
Optimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Chip-MultiProcessor (CMP) is critical for avoiding off-chip accesses. Temporal locality, while being exploited by first levels of private cache memories, is only slightly exhibited by the stream of references arriving at the SLLC. Thus, traditional replacement ...
Keywords: Replacement policy, shared resources management

3 published by ACM
January 2012 ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers: Volume 8 Issue 4, January 2012
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 1,   Downloads (12 Months): 21,   Downloads (Overall): 543

Full text available: PDFPDF
Hardware data prefetch is a very well known technique for hiding memory latencies. However, in a multicore system fitted with a shared Last-Level Cache (LLC), prefetch induced by a core consumes common resources such as shared cache space and main memory bandwidth. This may degrade the performance of other cores ...
Keywords: Prefetch, shared resources management

4
November 2011 Microprocessors & Microsystems: Volume 35 Issue 8, November, 2011
Publisher: Elsevier Science Publishers B. V.
Bibliometrics:
Citation Count: 0

Coherence protocols consume an important fraction of power to determine which coherence action to perform. Specifically, on CMPs with shared cache and directory-based coherence protocol implemented as a duplicate of local caches tags, we have observed that a big fraction of directory lookups cause a miss, because the block looked ...
Keywords: Coherence actions filtering, Coherence directory, Chip multiprocessor (CMP)

5
August 2011 Euro-Par'11: Proceedings of the 17th international conference on Parallel processing - Volume Part I
Publisher: Springer-Verlag
Bibliometrics:
Citation Count: 0

In CMPs, coherence protocols are used to maintain data coherence among the multiple local caches. In this paper, we focus on CMPs using write-through local caches, and a directory-based coherence protocol implemented as a duplicate of the local cache tags. A large fraction of directory lookups is due to stores ...

6
September 2010 DSD '10: Proceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

Coherence protocols consume an important fraction of power to determine which coherence action should take place. In this paper we focus on CMPs with a shared cache and a directory-based coherence protocol implemented as a duplicate of local caches tags. We observe that a big fraction of directory lookups produce ...

7
October 2009 IEEE Transactions on Computers: Volume 58 Issue 10, October 2009
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

This paper focuses on how to design a Store Buffer (STB) well suited to first-level multibanked data caches. The goal is to forward data from in-flight stores into dependent loads within the latency of a cache bank. Taking into account the store lifetime in the processor pipeline and the data ...
Keywords: Cache memories, computer architecture, memory architecture, pipeline processing., Cache memories, computer architecture, memory architecture, pipeline processing.

8
August 2009 Euro-Par '09: Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Publisher: Springer-Verlag
Bibliometrics:
Citation Count: 3

Understanding and optimizing the synchronization operations of parallel programs in distributed shared memory multiprocessors (<Emphasis Type="SmallCaps">dsm</Emphasis> ), is one of the most important factors leading to significant reductions in execution time. This paper introduces a new methodology for tuning performance of parallel programs. We focus on the critical sections used ...

9
May 2009 IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

Pipelining the scheduling logic, which exposes and exploits the instruction level parallelism, degrades processor performance. In a 4-issue processor, our evaluations show that pipelining the scheduling logic over two cycles degrades performance by 10% in SPEC-2000 integer benchmarks. Such a performance degradation is due to sacrificing the ability to execute ...

10 published by ACM
September 2007 MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 0,   Downloads (12 Months): 2,   Downloads (Overall): 369

Full text available: PDFPDF
Computer manufacturers offer today multicore with multi-threading capabilities and a broad range of number of cores. An important market today for these multicores is in the server domain. Web servers are a class of servers which are widely used to provide access to files and also as front-ends of more ...

11 published by ACM
August 2007 ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and design
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1,   Downloads (12 Months): 3,   Downloads (Overall): 150

Full text available: PDFPDF
In the presence of a long-latency instruction as a L2 miss, the issue queue (IQ) may fill with instructions dependent on the L2 miss; consequently, the IQ will not expose instruction-level parallelism until resolving the miss. In the scope of memory-latency tolerant processors, we propose delaying the insertion into the ...
Keywords: L2 hit-miss prediction, energy consumption, memory-latency tolerant processors

12
April 2007 Journal of Systems Architecture: the EUROMICRO Journal: Volume 53 Issue 4, April, 2007
Publisher: Elsevier North-Holland, Inc.
Bibliometrics:
Citation Count: 0

Value speculation is a speculative technique proposed to reduce the execution time of programs. It relies on a predictor, a checker and a recovery mechanism. The predictor predicts the result of an instruction in order to issue speculatively its dependent instructions, the checker checks the prediction after issuing the predicted ...
Keywords: Address prediction, Value speculation, Recovery mechanism, Speculative execution

13
August 2006 Euro-Par'06: Proceedings of the 12th international conference on Parallel Processing
Publisher: Springer-Verlag
Bibliometrics:
Citation Count: 0

Synchronization in parallel programs is a major performance bottleneck. Shared data is protected by locks and a lot of time is spent in the competition arising at the lock hand-off. In this period of time, a large amount of traffic is targeted to the line holding the lock variable. In ...

14 published by ACM
September 2005 ACM Transactions on Architecture and Code Optimization (TACO): Volume 2 Issue 3, September 2005
Publisher: ACM
Bibliometrics:
Citation Count: 14
Downloads (6 Weeks): 0,   Downloads (12 Months): 9,   Downloads (Overall): 624

Full text available: PDFPDF
Thread-Level Speculation (TLS) provides architectural support to aggressively run hard-to-analyze code in parallel. As speculative tasks run concurrently, they generate unsafe or speculative memory state that needs to be separately buffered and managed in the presence of distributed caches and buffers. Such a state may contain multiple versions of the ...
Keywords: Caching and buffering support, coherence protocol, memory hierarchies, shared-memory multiprocessors, thread-level speculation

15
May 2005 ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 15
Downloads (6 Weeks): 0,   Downloads (12 Months): 4,   Downloads (Overall): 498

Full text available: PDFPDF
This paper focuses on how to design a Store Buffer (STB) well suited to first-level multibanked data caches. Our goal is to forward data from in-flight stores to dependent loads with the latency of a cache bank. For that we propose a particular two-level STB design in which forwarding is ...
Also published in:
May 2005  ACM SIGARCH Computer Architecture News - ISCA 2005: Volume 33 Issue 2, May 2005

16
October 2003 IEEE Transactions on Parallel and Distributed Systems: Volume 14 Issue 10, October 2003
Publisher: IEEE Press
Bibliometrics:
Citation Count: 5

This paper presents a new cost-effective algorithm to compute exact loop bounds when multilevel tiling is applied to a loop nest having affine functions as bounds (nonrectangular loop nest). Traditionally, exact loop bounds computation has not been performed because its complexity is doubly exponential on the number of loops in ...

17
September 2003 PACT '03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 2

In Thread-Level Speculation (TLS), speculative tasks generate memory state that cannot simply be combined with the rest of the system because it is unsafe. One way to deal with this difficulty is to allow speculative state to merge with memory but back up in an undo log the data that ...

18
February 2003 HPCA '03: Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 19

Thread-level speculation provides architectural support to aggressively run hard-to-analyze code in parallel. As speculative tasks run concurrently, they generate unsafe or speculative memory state that needs to be separately buffered and managed in the presence of distributed caches and buffers. Such state may contain multiple versions of the same variable.In ...

19 published by ACM
July 2002 ACM Transactions on Programming Languages and Systems (TOPLAS): Volume 24 Issue 4, July 2002
Publisher: ACM
Bibliometrics:
Citation Count: 14
Downloads (6 Weeks): 1,   Downloads (12 Months): 8,   Downloads (Overall): 675

Full text available: PDFPDF
Loop tiling is a well-known loop transformation generally used to expose coarse-grain parallelism and to exploit data reuse at the cache level. Tiling can also be used to exploit data reuse at the register level and to improve a program's ILP. However, previous proposals in the literature (as well as ...
Keywords: locality, loop optimization, loop tiling, Data reuse, register level

20
September 2001 PACT '01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 15

Abstract: Signalling result availability from the functional units to the instruction scheduler can increase the cycle time and/or the effective latency of the instructions. The knowledge of all instruction latencies would allow the instruction scheduler to operate without the need of external signalling. However, the latency of some instructions is ...



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us