Author image not provided
 George Bosilca

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article4.29
Citation Count193
Publication count45
Publication years2005-2017
Available for download15
Average downloads per article129.67
Downloads (cumulative)1,945
Downloads (12 Months)493
Downloads (6 Weeks)56
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


47 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 47
Result page: 1 2 3

Sort by:

1 published by ACM
June 2018 HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 12,   Downloads (12 Months): 75,   Downloads (Overall): 75

Full text available: PDFPDF
The increase in scale and heterogeneity of high-performance computing (HPC) systems predispose the performance of Message Passing Interface (MPI) collective communications to be susceptible to noise, and to adapt to a complex mix of hardware capabilities. The designs of state of the art MPI collectives heavily rely on synchronizations; these ...
Keywords: GPU, MPI, collectives operations, event-driven, heterogeneous system, system noise

2
January 2018 International Journal of High Performance Computing Applications: Volume 32 Issue 1, 1 2018
Publisher: Sage Publications, Inc.
Bibliometrics:
Citation Count: 0

Building an infrastructure for exascale applications requires, in addition to many other key components, a stable and efficient failure detector. This article describes the design and evaluation of a robust failure detector that can maintain and distribute the correct list of alive resources within proven and scalable bounds. The detection ...
Keywords: MPI, failure detection, fault tolerance

3 published by ACM
November 2017 ScalA '17: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 11,   Downloads (12 Months): 50,   Downloads (Overall): 50

Full text available: PDFPDF
Successfully exploiting distributed collections of heterogeneous many-cores architectures with complex memory hierarchy through a portable programming model is a challenge for application developers. The literature is not short of proposals addressing this problem, including many evolutionary solutions that seek to extend the capabilities of current message passing paradigms with intra-node ...
Keywords: data-flow, dynamic task-graph, PaRSEC, task-based runtime

4 published by ACM
October 2017 Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 6,   Downloads (12 Months): 75,   Downloads (Overall): 75

Full text available: PDFPDF
We consider the problem of how to reduce the cost of communication that is required for the parallel training of a neural network. The state-of-the-art method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD), requires many collective communication operations, like broadcasts of parameters or reductions for partial gradient aggregations, which for ...
Keywords: deep learning system, mpi collectives, neural networks

5 published by ACM
September 2017 EuroMPI '17: Proceedings of the 24th European MPI Users' Group Meeting
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 4,   Downloads (12 Months): 50,   Downloads (Overall): 50

Full text available: PDFPDF
This paper details the implementation and usage of software-based performance counters to understand the performance of a particular implementation of the MPI standard, Open MPI. Such counters can expose intrinsic features of the software stack that are not available otherwise in a generic and portable way. The PMPI-interface is useful ...
Keywords: performance counters, profiling, MPI, tools

6
November 2016 SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 4,   Downloads (12 Months): 56,   Downloads (Overall): 187

Full text available: PDFPDF
Building an infrastructure for Exascale applications requires, in addition to many other key components, a stable and efficient failure detector. This paper describes the design and evaluation of a robust failure detector, able to maintain and distribute the correct list of alive resources within proven and scalable bounds. The detection ...
Keywords: fault-tolerance, MPI, failure detection

7 published by ACM
May 2016 HPDC '16: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1,   Downloads (12 Months): 36,   Downloads (Overall): 116

Full text available: PDFPDF
Due to better parallel density and power efficiency, GPUs have become more popular for use in scientific applica- tions. Many of these applications are based on the ubiquitous Message Passing Interface (MPI) programming paradigm, and take advantage of non-contiguous memory layouts to exchange data between processes. However, support for efficient ...
Keywords: datatype, gpu, hybrid architecture, mpi, non-contiguous data

8
February 2016 Parallel Computing: Volume 52 Issue C, February 2016
Publisher: Elsevier Science Publishers B. V.
Bibliometrics:
Citation Count: 0

Algorithms for finding the optimal distribution compatible with a given data partition.Analysis of the algorithms for different cost metrics.NP-completeness proof for the redistribution problem followed by a computational kernel.Experimental results for the 1D-stencil kernel and the QR factorization algorithm. The classical redistribution problem aims at optimally scheduling communications when reshuffling ...
Keywords: Parsec, Redistribution, Stencil, QR factorization, Linear algebra, Data partition

9 published by ACM
November 2015 SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 2,   Downloads (12 Months): 18,   Downloads (Overall): 159

Full text available: PDFPDF
The ability to consistently handle faults in a distributed environment requires, among a small set of basic routines, an agreement algorithm allowing surviving entities to reach a consensual decision between a bounded set of volatile resources. This paper presents an algorithm that implements an Early Returning Agreement (ERA) in pseudo-synchronous ...
Keywords: MPI, agreement, fault-tolerance

10 published by ACM
September 2015 EuroMPI '15: Proceedings of the 22nd European MPI Users' Group Meeting
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1,   Downloads (12 Months): 8,   Downloads (Overall): 57

Full text available: PDFPDF
Advanced failure recovery strategies in HPC system benefit tremendously from in-place failure recovery, in which the MPI infrastructure can survive process crashes and resume communication services. In this paper we present the rationale behind the specification, and an effective implementation of the Revoke MPI operation. The purpose of the Revoke ...

11 published by ACM
September 2015 EuroMPI '15: Proceedings of the 22nd European MPI Users' Group Meeting
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 2,   Downloads (12 Months): 16,   Downloads (Overall): 143

Full text available: PDFPDF
This paper considers the questions of how spare nodes should be allocated, how to substitute them for faulty nodes, and how much the communication performance is affected by such a substitution. The third question stems from the modification of the rank mapping by node substitutions, which can incur additional message ...
Keywords: communication performance, spare node, fault mitigation, fault tolerance

12
September 2015 CLUSTER '15: Proceedings of the 2015 IEEE International Conference on Cluster Computing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 3

Task-based execution has been growing in popularity as a means to deliver a good balance between performance and portability in the post-petascale era. The Parallel Runtime Scheduling and Execution Control (PARSEC) framework is a task-based runtime system that we designed to achieve high performance computing at scale. PARSEC offers a ...
Keywords: PaRSEC, Tasks, DAG, PTG

13
August 2015 HOTI '15: Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 4

This paper presents Unified Communication X (UCX), a set of network APIs and their implementations for high throughput computing. UCX comes from the combined effort of national laboratories, industry, and academia to design and implement a high-performing and highly-scalable network stack for next generation applications and systems. UCX design provides ...
Keywords: HPC, Middleware, MPI, OpenSHMEM, PGAS, RDMA, Infiniband

14
August 2015 OpenSHMEM 2015: Revised Selected Papers of the Second Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies - Volume 9397
Publisher: Springer-Verlag New York, Inc.
Bibliometrics:
Citation Count: 0

This work details the opportunities and challenges of porting a Petascale, MPI-based application --LAMMPS-- to OpenSHMEM. We investigate the major programming challenges stemming from the differences in communication semantics, address space organization, and synchronization operations between the two programming models. This work provides several approaches to solve those challenges for ...

15
May 2015 IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1

As the scale of modern computing systems grows, failures will happen more frequently. On the way to Exactable a generic, low-overhead, resilient extension becomes a desired aptitude of any programming paradigm. In this paper we explore three additions to a dynamic task-based runtime to build a generic framework providing soft ...
Keywords: soft error resilience, runtime, fault tolerance

16
May 2015 IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 10

Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak commutational capacity. Despite significant advances in the progriming interfaces to such hybrid architectures, traditional programming paradigms struggle with mapping the resulting multi-dimensional heterogeneity and the expression of algarhythm parallelism, resulting in sub-optimal effective performance. Task-based programming ...
Keywords: PaRSEC runtime, GPU, dense linear algebra, heterogeneous architecture

17
April 2015 Supercomputing Frontiers and Innovations: an International Journal: Volume 2 Issue 2, April 2015
Publisher: South Ural State University
Bibliometrics:
Citation Count: 0

Ultrascale computing systems are meant to reach a growth of two or three orders of magnitude of today computing systems. However, to achieve the performances required, we will need to design and implement more sustainable solutions for ultra-scale computing systems, understanding sustainability in a holistic manner to address challenges as ...
Keywords: MPI, resilience, MPI applications, programming models, MPI sustainability, data management

18 published by ACM
February 2015 ACM Transactions on Parallel Computing - Special Issue on PPOPP 2012: Volume 1 Issue 2, January 2015
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 3,   Downloads (12 Months): 31,   Downloads (Overall): 244

Full text available: PDFPDF
Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific applications that require solving systems of linear equations, eigenvalues and linear least squares problems. Such computations are normally carried out on supercomputers, whose ever-growing scale induces a fast decline of the Mean Time To Failure (MTTF). ...
Keywords: ABFT, high performance computing, fault-tolerance, linear algebra

19
December 2014 Concurrency and Computation: Practice & Experience: Volume 26 Issue 17, December 2014
Publisher: John Wiley and Sons Ltd.
Bibliometrics:
Citation Count: 2

In this paper, we present a unified model for several well-known checkpoint/restart protocols. The proposed model is generic enough to encompass both extremes of the checkpoint/restart space, from coordinated approaches to a variety of uncoordinated checkpoint strategies with message logging. We identify a set of crucial parameters, instantiate them, and ...
Keywords: checkpoint/restart, hierarchical checkpoint with message logging, checkpointing waste optimization problem, coordinated checkpoint

20
November 2014 WOLFHPC '14: Proceedings of the 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

Increased parallelism and use of heterogeneous computing resources is now an established trend in High Performance Computing (HPC), a trend that, looking forward to Exascale, seems bound to intensify. Despite the evolution of hardware over the past decade, the programming paradigm of choice was invariably derived from Coarse Grain Parallelism ...



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us