Author image not provided
 Josep Maria Codina

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article7.38
Citation Count118
Publication count16
Publication years2001-2012
Available for download9
Average downloads per article513.11
Downloads (cumulative)4,618
Downloads (12 Months)111
Downloads (6 Weeks)13
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


16 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 16 of 16
Sort by:

1
January 2012 IEEE Computer Architecture Letters: Volume 11 Issue 1, January 2012
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1

In this paper, we propose a novel programmable functional unit (PFU) to accelerate general purpose application execution on a modern out-of-order x86 processor. Code is transformed and instructions are generated that run on the PFU using a co-designed virtual machine (Cd-VM). Results presented in this paper show that this HW/SW ...
Keywords: Hardware/software interfaces, Processor Architectures, Micro-architecture implementation considerations

2
October 2011 SBAC-PAD '11: Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

A co-designed processor helps in cutting down both the complexity and power consumption by co-designing certain key performance enablers. In this paper, we propose a FIFO based co-designed out-of-order processor. Multiple FIFOs are added in order to dynamically schedule, in a complexity-effective manner, the micro-ops. We propose a commit logic ...

3 published by ACM
May 2011 CF '11: Proceedings of the 8th ACM International Conference on Computing Frontiers
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 0,   Downloads (12 Months): 5,   Downloads (Overall): 133

Full text available: PDFPDF
In this paper we propose SoftHV, a high-performance HW/SW co-designed in-order processor that performs horizontal and vertical fusion of instructions. SoftHV consists of a co-designed virtual machine (Cd-VM) which reorders, removes and fuses instructions from frequently executed regions of code. On the hardware front, SoftHV implements HW features for efficient ...
Keywords: micro-op fusion, co-designed virtual machine

4
February 2011 INTERACT '11: Proceedings of the 2011 15th Workshop on Interaction between Compilers and Computer Architectures
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

In this paper, we propose a novel programmable functional unit (PFU) to accelerate general purpose application execution on a modern out-of-order x86 processor in a complexity-effective way. Code is transformed and instructions are generated that run on the PFU using a co-designed virtual machine (Cd-VM). Groups offrequently executed micro-operations (micro-ops) ...
Keywords: Co-designed Virtual Machine, Programmable Functional Unit

5
September 2009 PACT '09: Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 3

Industry is moving towards multi-core designs as we have hit the memory and power walls. Multi-core designs are very effective to exploit thread-level parallelism (TLP) but do not provide benefits when executing serial code (applications with low TLP, serial parts of a parallel application and legacy code). In this paper ...
Keywords: Speculative multithreading, thread-level parallelism, single-thread performance, multicore, automatic parallelization

6 published by ACM
June 2009 ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 1,   Downloads (12 Months): 27,   Downloads (Overall): 1,382

Full text available: PDFPDF
Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with limited TLP impose important constraints to the global performance, as explained by Amdahl's ...
Keywords: automatic parallelization, single-thread performance, multicore, thread-level parallelism, core-fusion, speculative multithreading
Also published in:
June 2009  ACM SIGARCH Computer Architecture News: Volume 37 Issue 3, June 2009

7
June 2009 IEEE Transactions on Computers: Volume 58 Issue 6, June 2009
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 5

This paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the number of intercluster communications at the same time. Partitioning is guided by approximate schedules (i.e., pseudoschedules), which take into ...
Keywords: Clustered microarchitectures, ILP, instruction replication, modulo scheduling, statically scheduled processors., ILP, instruction replication, modulo scheduling, statically scheduled processors., Clustered microarchitectures

8
March 2007 CGO '07: Proceedings of the International Symposium on Code Generation and Optimization
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 2,   Downloads (12 Months): 8,   Downloads (Overall): 222

Full text available: PDFPDF
Increasing performance, while at the same time reducing power consumption, is a major design tradeoff in current microprocessors. In this paper, we investigate the potential of using a heterogeneous clustered VLIW microarchitecture. In the proposed microarchitecture, each cluster, the interconnection network and the supporting memory hierarchy can run at different ...

9
March 2007 CGO '07: Proceedings of the International Symposium on Code Generation and Optimization
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 0,   Downloads (12 Months): 6,   Downloads (Overall): 288

Full text available: PDFPDF
This paper presents an instruction scheduling and cluster assignment approach for clustered processors. The proposed technique makes use of a novel representation named the scheduling graph which describes all possible schedules. A powerful deduction process is applied to this graph, reducing at each step the set of possible schedules. In ...

10 published by ACM
June 2005 ACM SIGPLAN Notices - Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation: Volume 40 Issue 6, June 2005
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 2,   Downloads (12 Months): 16,   Downloads (Overall): 441

Full text available: PDFPDF
Modulo scheduling is an effective code generation technique that exploits the parallelism in program loops by overlapping iterations. One drawback of this optimization is that register requirements increase significantly because values across different loop iterations can be live concurrently. One possible solution to reduce register pressure is to insert spill ...
Keywords: register allocation, spill code, modulo scheduling
Also published in:
June 2005  PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

11 published by ACM
June 2004 ACM Transactions on Architecture and Code Optimization (TACO): Volume 1 Issue 2, June 2004
Publisher: ACM
Bibliometrics:
Citation Count: 7
Downloads (6 Weeks): 2,   Downloads (12 Months): 14,   Downloads (Overall): 716

Full text available: PDFPDF
The need to communicate values between clusters can result in a significant performance loss for clustered microarchitectures. In this work, we describe an optimization technique that removes communications by selectively replicating an appropriate set of instructions. Instruction replication is done carefully because it might degrade performance due to the increased ...
Keywords: modulo-scheduling, statically scheduled processors, ILP, instruction replication, Clustered microarchitectures

12
December 2003 MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 9
Downloads (6 Weeks): 0,   Downloads (12 Months): 3,   Downloads (Overall): 300

Full text available: PDFPDF
This work presents a new compilation technique that usesinstruction replication in order to reduce the number ofcommunications executed on a clusteredmicroarchitecture. For such architectures, the need tocommunicate values between clusters can result in asignificant performance loss. Inter-clustercommunications can be reduced by selectively replicatingan appropriate set of instructions. However, instructionreplication must ...

13
September 2002 PACT '02: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 14

This paper presents a new modulo scheduling algorithm for clustered microarchitectures. The main feature of the proposed scheme is that the assignment of instructions to clusters is done by means of graph partitioning algorithms that are guided by a pseudo-scheduler. Thispseudo-scheduler is a simplified version of the full instruction scheduler ...

14 published by ACM
June 2002 ICS '02: Proceedings of the 16th international conference on Supercomputing
Publisher: ACM
Bibliometrics:
Citation Count: 18
Downloads (6 Weeks): 5,   Downloads (12 Months): 22,   Downloads (Overall): 667

Full text available: PDFPDF
Modulo Scheduling is an instruction scheduling technique that is used by many current compilers. Different approaches have been proposed in the past but there is not a quantitative comparison among them, using the same compiling platform, benchmarks and architectures.This paper presents a performance comparison of the most relevant Modulo Scheduling ...
Keywords: Modulo scheduling, instruction level parallel architectures, comparative study, quantitative evaluation, instruction scheduling

15
December 2001 MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 21
Downloads (6 Weeks): 1,   Downloads (12 Months): 10,   Downloads (Overall): 469

Full text available: PDFPDF
This work presents a novel scheme to schedule loops for clustered microarchitectures. The scheme is based on a preliminary cluster assignment phase implemented through graph partitioning techniques followed by a scheduling phase that integrates register allocation and spill code generation. The graph partitioning scheme is shown to be very effective ...

16
September 2001 PACT '01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 26

Abstract: This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows ...
Keywords: Modulo scheduling, register allocation, spill code, cluster assignment, clustered architectures



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us