Author image not provided
 Dhabaleswar Kumar (DK) Panda

Authors:
Add personal information
Professional ACM Member badge
  Affiliation history
Bibliometrics: publication history
Average citations per article6.87
Citation Count2,046
Publication count298
Publication years1988-2019
Available for download81
Average downloads per article323.36
Downloads (cumulative)26,192
Downloads (12 Months)2,949
Downloads (6 Weeks)305
SEARCH
ROLE
Arrow RightAuthor only
· Editor only
· Advisor only
· Other only
· All roles


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


298 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 298
Result page: 1 2 3 4 5 6 7 8 9 10 >>

Sort by:

1 published by ACM
June 2019 HPDC '19: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 15,   Downloads (12 Months): 98,   Downloads (Overall): 98

Full text available: PDFPDF
Distributed storage systems typically need data to be stored redundantly to guarantee data durability and reliability. While the conventional approach towards this objective is to store multiple replicas, today's unprecedented data growth rates encourage modern distributed storage systems to employ Erasure Coding (EC) techniques, which can achieve better storage efficiency. ...
Keywords: distributed storage systems, high performance, multi-rail erasure coding

2 published by ACM
April 2019 GPGPU '19: Proceedings of the 12th Workshop on General Purpose Processing Using GPUs
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 26,   Downloads (12 Months): 152,   Downloads (Overall): 152

Full text available: PDFPDF
The CUDA Unified Memory (UM) interface enables a significantly simpler programming paradigm and has the potential to fundamentally change the way programmers write CUDA applications in the future. Although UM leads to high productivity in programming using CUDA by simplifying the programmer's view of CPU and GPU memory spaces, initial ...
Keywords: CUDA, HPC, MPI, Unified Memory

3 published by ACM
February 2019 PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 14,   Downloads (12 Months): 130,   Downloads (Overall): 130

Full text available: PDFPDF
The current wave of advances in Deep Learning (DL) has led to many exciting challenges and opportunities for Computer Science and Artificial Intelligence researchers alike. Modern DL frameworks like Caffe2, TensorFlow, Cognitive Toolkit (CNTK), PyTorch, and several others have emerged that offer ease of use and flexibility to describe, train, ...
Keywords: DNN training, HPC, MPI, high-performance deep learning, machine learning

4
November 2018 SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 7,   Downloads (12 Months): 102,   Downloads (Overall): 102

Full text available: PDFPDF
With the emergence of larger multi-/many-core clusters and new areas of HPC applications, performance of large message communication is becoming more important. MPI libraries use different rendezvous protocols to perform large message communication. However, existing rendezvous protocols do not take the overall communication pattern into account or make optimal use ...
Keywords: HPC, MPI, rendezvous protocols

5 published by ACM
October 2018 SoCC '18: Proceedings of the ACM Symposium on Cloud Computing
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 11,   Downloads (12 Months): 90,   Downloads (Overall): 90

Full text available: PDFPDF
Various hardware-based Erasure Coding (EC) schemes have been proposed [5, 6, 8, 12-14] to leverage the advanced compute capabilities on modern data centers. Currently, there is no unified and easy way for distributed storage systems to fully exploit multiple devices such as CPUs, GPUs, and network devices (i.e., multi-rail support) ...

6 published by ACM
September 2018 EuroMPI'18: Proceedings of the 25th European MPI Users' Group Meeting
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 2,   Downloads (12 Months): 47,   Downloads (Overall): 47

Full text available: PDFPDF
Intel Knights Landing (KNL) and IBM POWER architectures are becoming widely deployed on modern supercomputing systems due to its powerful components. MPI Remote Memory Access (RMA) model that provides one-sided communication semantics has been seen as an attractive approach for developing High-Performance Data Analytics (HPDA) applications such as graph processing ...
Keywords: Graph500, HPC, KNL, MPI RMA, POWER

7 published by ACM
September 2018 EuroMPI'18: Proceedings of the 25th European MPI Users' Group Meeting
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 9,   Downloads (12 Months): 110,   Downloads (Overall): 110

Full text available: PDFPDF
Traditionally, MPI runtimes have been designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and dense multi-GPU systems, it has become important to design efficient communication schemes. This coupled with new application workloads brought forward by Deep Learning frameworks like Caffe and Microsoft ...
Keywords: CUDA-Aware MPI, Distributed Deep Learning, HPC, MPI_Bcast, NCCL

8 published by ACM
September 2018 EuroMPI'18: Proceedings of the 25th European MPI Users' Group Meeting
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 16,   Downloads (12 Months): 90,   Downloads (Overall): 90

Full text available: PDFPDF
The overlap of computation and communication is critical for good performance of many HPC applications. State-of-the-art designs for the asynchronous progress require specially designed hardware resources (advanced switches or network interface cards), dedicated processor cores or application modification (e.g. use of MPI_Test). These techniques suffer from various issues like increasing ...
Keywords: Async progress, Blocking/nonblocking operations, Collective operations, Communication computation overlap, HPC, MPI, P3DFFT, SPEC MPI2007 benchmarks

9 published by ACM
December 2017 UCC '17: Proceedings of the10th International Conference on Utility and Cloud Computing
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 8,   Downloads (12 Months): 62,   Downloads (Overall): 162

Full text available: PDFPDF
Significant growth has been witnessed during the last few years in HPC clusters with multi-/many-core processors, accelerators, and high-performance interconnects (such as InfiniBand, Omni-Path, iWARP, and RoCE). To alleviate the cost burden, sharing HPC cluster resources to end users through virtualization for both scientific computing and Big Data processing is ...
Keywords: big data, container, deep learning, hpc clouds, infiniband, mpi, virtual machine

10 published by ACM
December 2017 BDCAT '17: Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 10,   Downloads (12 Months): 144,   Downloads (Overall): 329

Full text available: PDFPDF
In recent years there has been a surge in applications focusing on streaming data to generate insights in real-time. Both academia, as well as industry, have tried to address this use case by developing a variety of Stream Processing Engines (SPEs) with a diverse feature set. On the other hand, ...
Keywords: big data, hpc clusters, message queue, profiling, real time, stream processing

11 published by ACM
December 2017 UCC '17: Proceedings of the10th International Conference on Utility and Cloud Computing
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 8,   Downloads (12 Months): 99,   Downloads (Overall): 217

Full text available: PDFPDF
The Message Passing Interface (MPI) standard has become the de facto programming model for parallel computing with the last 25-year continuous community effort. With the development of building efficient HPC clouds, more and more MPI-based HPC applications start running on cloud-based environments. Singularity is one of the most attractive container ...
Keywords: container, hpc clouds, intel knl, intel omni-path, mpi, singularity

12 published by ACM
November 2017 SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 14,   Downloads (12 Months): 88,   Downloads (Overall): 239

Full text available: PDFPDF
Existing designs for MPI_Allreduce do not take advantage of the vast parallelism available in modern multi-/many-core processors like Intel Xeon/Xeon Phis or the increases in communication throughput and recent advances in high-end features seen with modern interconnects like InfiniBand and Omni-Path. In this paper, we propose a high-performance and scalable ...
Keywords: MPI, MPI_allreduce, SHArP, collectives, data partitioning, multi-leader

13 published by ACM
November 2017 MLHPC'17: Proceedings of the Machine Learning on HPC Environments
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 21,   Downloads (12 Months): 217,   Downloads (Overall): 390

Full text available: PDFPDF
Traditionally, Deep Learning (DL) frameworks like Caffe, TensorFlow, and Cognitive Toolkit exploited GPUs to accelerate the training process. This has been primarily achieved by aggressive improvements in parallel hardware as well as through sophisticated software frameworks like cuDNN and cuBLAS. However, recent enhancements to CPU-based hardware and software has the ...
Keywords: High-Performance Computing, Unified Memory, Caffe, Deep Learning, Pascal Architecture

14 published by ACM
September 2017 EuroMPI '17: Proceedings of the 24th European MPI Users' Group Meeting
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 4,   Downloads (12 Months): 31,   Downloads (Overall): 103

Full text available: PDFPDF
MPI implementations are becoming increasingly complex and highly tunable, and thus scalability limitations can come from numerous sources. The MPI Tools Interface (MPI_T) introduced as part of the MPI 3.0 standard provides an opportunity for performance tools and external software to introspect and understand MPI runtime behavior at a deeper ...
Keywords: BEACON, MPI_T, MVAPICH2, TAU, autotuning, performance engineering, performance recommendations, runtime introspection

15 published by ACM
July 2017 PEARC17: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 5,   Downloads (12 Months): 36,   Downloads (Overall): 156

Full text available: PDFPDF
The Stampede 1 supercomputer was a tremendous success as an XSEDE resource, providing more than eight million successful computational simulations and data analysis jobs to more than ten thousand users. In addition, Stampede 1 introduced new technology that began to move users towards many core processors. As Stampede 1 reaches ...
Keywords: High Performance Computing, Supercomputing, Xeon Phi

16
May 2017 CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0,   Downloads (12 Months): 16,   Downloads (Overall): 94

Full text available: PDFPDF
Running Big Data applications in the cloud has become extremely popular in recent times. To enable the storage of data for these applications, cloud-based distributed storage solutions are a must. OpenStack Swift is an object storage service which is widely used for such purposes. Swift is one of the main ...
Keywords: RDMA, Swift, OpenStack, High-performance interconnects

17 published by ACM
April 2017 VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 9,   Downloads (12 Months): 39,   Downloads (Overall): 184

Full text available: PDFPDF
Hypervisor-based virtualization solutions reveal good security and isolation, while container-based solutions make applications and workloads more portable and distributed in an effective, standardized and repeatable way. Therefore, nested virtualization based computing environments (e.g., container over virtual machine), which inherit the capabilities from both solutions, are becoming more and more attractive ...
Keywords: CMA, MPI, Nested Virtualization, Container, IVShmem, Cloud Computing, Hypervisor, SR-IOV
Also published in:
September 2017  ACM SIGPLAN Notices - VEE '17: Volume 52 Issue 7, July 2017

18 published by ACM
January 2017 PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Publisher: ACM
Bibliometrics:
Citation Count: 10
Downloads (6 Weeks): 25,   Downloads (12 Months): 276,   Downloads (Overall): 1,504

Full text available: PDFPDF
Availability of large data sets like ImageNet and massively parallel computation support in modern HPC devices like NVIDIA GPUs have fueled a renewed interest in Deep Learning (DL) algorithms. This has triggered the development of DL frameworks like Caffe, Torch, TensorFlow, and CNTK. However, most DL frameworks have been limited ...
Keywords: caffe, cuda-aware mpi, mpi\_reduce, distributed training, deep learning
Also published in:
October 2017  ACM SIGPLAN Notices - PPoPP '17: Volume 52 Issue 8, August 2017

19 published by ACM
December 2016 BDCAT '16: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 3,   Downloads (12 Months): 11,   Downloads (Overall): 84

Full text available: PDFPDF
Big Data Systems are becoming increasingly complex and generally have very high operational costs. Cloud computing offers attractive solutions for managing large scale systems. However, one of the major bottlenecks in VM performance is virtualized I/O. Since Big Data applications and middleware rely heavily on high performance interconnects such as ...
Keywords: infiniband, SR-IOV, big data, hadoop, virtualization

20
November 2016 PAW '16: Proceedings of the First Workshop on PGAS Applications
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0

PGAS models with a lightweight synchronization and shared memory abstraction, are seen as a good alternative to the Message Passing model for irregular communication patterns. OpenSHMEM is a library based PGAS model. OpenSHMEM 1.3 introduced Non-Blocking data movement operations to provide better asynchronous progress and overlap. In this paper, we ...
Keywords: computers and information processing, computer science, programming, parallel programming



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us