Author image not provided
 Eduardo HM M Cruz

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article2.88
Citation Count49
Publication count17
Publication years2010-2017
Available for download4
Average downloads per article255.75
Downloads (cumulative)1,023
Downloads (12 Months)252
Downloads (6 Weeks)23
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


19 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 19 of 19
Sort by:

1
July 2018
Bibliometrics:
Citation Count: 0

This book presents a study on how thread and data mapping techniques can be used to improve the performance of multi-core architectures. It describes how the memory hierarchy introduces non-uniform memory access, and how mapping can be used to reduce the memory access latency in current hardware architectures. On the ...

2 published by ACM
May 2017 CF'17: Proceedings of the Computing Frontiers Conference
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1,   Downloads (12 Months): 45,   Downloads (Overall): 94

Full text available: PDFPDF
Optimizing the memory access behavior is an important challenge to improve the performance and energy consumption of parallel applications on shared memory architectures. Modern systems contain complex memory hierarchies with multiple memory controllers and several levels of caches. In such machines, analyzing the affinity between threads and data to map ...
Keywords: NUMA, data mapping, thread mapping, memory affinity

3
May 2017 International Journal of High Performance Computing Applications: Volume 31 Issue 3, 5 2017
Publisher: Sage Publications, Inc.
Bibliometrics:
Citation Count: 0

Many modern parallel architectures feature a nonuniform memory access NUMA behavior since they contain several memory controllers. In such architectures, deciding where to place memory pages has a high influence on the performance of parallel applications. This placement of pages to NUMA nodes is called data mapping. Two basic types ...
Keywords: NUMA, data mapping, memory access behavior, balance, locality

4 published by ACM
December 2016 ACM Computing Surveys (CSUR): Volume 49 Issue 4, February 2017
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 13,   Downloads (12 Months): 145,   Downloads (Overall): 529

Full text available: PDFPDF
Shared memory architectures have recently experienced a large increase in thread-level parallelism, leading to complex memory hierarchies with multiple cache memory levels and memory controllers. These new designs created a Non-Uniform Memory Access (NUMA) behavior, where the performance and energy consumption of memory accesses depend on the place where the ...
Keywords: NUMA, Survey, cache memories, communication, data mapping, shared memory, thread mapping

5 published by ACM
September 2016 ACM Transactions on Architecture and Code Optimization (TACO): Volume 13 Issue 3, September 2016
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 2,   Downloads (12 Months): 33,   Downloads (Overall): 138

Full text available: PDFPDF
The performance and energy efficiency of modern architectures depend on memory locality, which can be improved by thread and data mappings considering the memory access behavior of parallel applications. In this article, we propose intense pages mapping, a mechanism that analyzes the memory access behavior using information about the time ...
Keywords: NUMA, Thread mapping, cache memory, communication, data mapping, data sharing

6
September 2016 IEEE Transactions on Parallel and Distributed Systems: Volume 27 Issue 9, September 2016
Publisher: IEEE Press
Bibliometrics:
Citation Count: 1

Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a major challenge in shared-memory architectures. Modern systems have deep and complex memory hierarchies with multiple cache levels and memory controllers, leading to a Non-Uniform Memory Access (NUMA) behavior. In such systems, there are two ways ...

7
August 2016 Proceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 9833
Publisher: Springer-Verlag New York, Inc.
Bibliometrics:
Citation Count: 0

In modern shared-memory architectures, it is important to map threads and data in a way that increases the locality of their memory accesses, thereby improving performance and energy efficiency. Threads that access shared data should be mapped close to each other in the memory hierarchy, while the data they access ...

8
May 2016 Parallel Computing: Volume 54 Issue C, May 2016
Publisher: Elsevier Science Publishers B. V.
Bibliometrics:
Citation Count: 0

We detect the memory access patterns in shared memory applications.Using the detected access patterns, we map the threads and data to improve performance.Provide a better usage of hardware resources.We reduce execution time, cache misses and traffic on interconnections.No need to modify applications or runtime environment. The performance and energy efficiency ...
Keywords: Communication, Page table, Data mapping, NUMA, Thread mapping

9
December 2015 Concurrency and Computation: Practice & Experience: Volume 27 Issue 17, December 2015
Publisher: John Wiley and Sons Ltd.
Bibliometrics:
Citation Count: 0

Threads of parallel applications need to communicate in order to fulfill their tasks. The communication performance between the cores in modern multi-core architectures differs because of the memory and interconnection hierarchies. In these architectures, it is important to map the threads of parallel applications by taking into account the communication ...
Keywords: shared memory, translation lookaside buffer, thread mapping

10
March 2015 PDP '15: Proceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 2

In parallel architectures that have a Non-Uniform Memory Access (NUMA) behavior, the mapping of memory pages to NUMA nodes influences the performance of parallel applications. In order to improve traditional data mapping policies, two basic strategies can be employed: optimizing locality or balance of memory accesses. In a locality-based policy, ...
Keywords: Data mapping, NUMA, Locality, Load balance

11
March 2015 PDP '15: Proceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 5

The communication between tasks of a parallel application is an important characteristic to consider when mapping tasks to computing cores due to possible differences in communication performance. Within a machine, performance differences are introduced by the memory hierarchy, in which cache memories can be shared by groups of cores and ...
Keywords: task mapping, communication, hardware topology, memory hierarchy

12
March 2015 Parallel Computing: Volume 43 Issue C, March 2015
Publisher: Elsevier Science Publishers B. V.
Bibliometrics:
Citation Count: 2

We perform online detection of inter-process and inter-thread communication.Detected communication pattern is used to migrate processes and threads.Operating System-based mechanism, no changes to applications or runtime libraries.We reduce execution time and energy consumption.Evaluation on shared memory machines and a cluster show substantial improvements. The rising complexity of memory hierarchies and ...
Keywords: Shared memory, Communication optimization, Mapping, Parallel applications

13
October 2014 SBAC-PAD '14: Proceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1

One of the main challenges for modern parallel shared-memory architectures are accesses to main memory. In current systems, the performance and energy efficiency of memory accesses depend on their locality: accesses to remote caches and NUMA nodes are more expensive than accesses to local ones. Increasing the locality requires knowledge ...

14 published by ACM
August 2014 PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 7,   Downloads (12 Months): 27,   Downloads (Overall): 218

Full text available: PDFPDF
One of the main challenges for parallel architectures is the increasing complexity of the memory hierarchy, which consists of several levels of private and shared caches, as well as interconnections between separate memories in NUMA machines. To make full use of this hierarchy, it is necessary to improve the locality ...
Keywords: cache hierarchies, numa, data affinity, thread affinity

15
March 2014 Journal of Parallel and Distributed Computing: Volume 74 Issue 3, March, 2014
Publisher: Academic Press, Inc.
Bibliometrics:
Citation Count: 4

In current computer architectures, the communication performance between threads varies depending on the memory hierarchy. This performance difference must be considered when mapping parallel applications to processor cores. In parallel applications based on the shared memory paradigm, the communication is difficult to detect because it is implicit. Furthermore, dynamic mapping ...
Keywords: Communication pattern, Shared memory, Cache coherence protocols, Thread communication, Parallel applications, Thread mapping

16
May 2013 IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 8

In current shared memory architectures, the complexity of the cache and memory hierarchies is increasing. Therefore, it is becoming more important to analyze the communication behavior of parallel applications when mapping threads to cores, to improve performance and energy efficiency. However, communication is implicit in most programming models for shared ...
Keywords: Mapping, Communication Detection, Page Table, Shared Pages

17
May 2012 IPDPS '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 6

The communication latency between the cores in multiprocessor architectures differs depending on the memory hierarchy and the interconnections. With the increase of the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the ...
Keywords: Thread mapping, Parallel applications, Shared memory, TLB, Translation Lookaside Buffer, Interconnections, Cache Misses

18
October 2010 WSCAD-SCC '10: Proceedings of the 2010 11th Symposium on Computing Systems
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

Process mapping is a technique widely used in parallel machines to provide performance gains by improving the use of resources such as interconnections and cache memory hierarchy. The problem to find the best mapping is considered NP-Hard and, in shared memory environments, there is the additional difficulty to find the ...

19
October 2010 WSCAD-SCC '10: Proceedings of the 2010 11th Symposium on Computing Systems
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

Computer simulation has allowed the analysis of behavior and performance of systems still in its design phase. The Advanced Superscalar Simulator project is a tool for simulation of a complete computer system, involving the simulation of a superscalar processor and an input and output system, with infrastructure for symmetric multiprocessing. ...



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us