Author image not provided
 Pradeep Kumar Dubey

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article18.93
Citation Count1,647
Publication count87
Publication years1986-2017
Available for download48
Average downloads per article1,389.94
Downloads (cumulative)66,717
Downloads (12 Months)5,887
Downloads (6 Weeks)629
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


87 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 87
Result page: 1 2 3 4 5

Sort by:

1 published by ACM
November 2017 SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 63,   Downloads (12 Months): 252,   Downloads (Overall): 252

Full text available: PDFPDF
This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation obtains ~2TFLOP/s ...

2 published by ACM
November 2017 SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 6,   Downloads (12 Months): 49,   Downloads (Overall): 49

Full text available: PDFPDF
The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application ...

3 published by ACM
June 2017 ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 203,   Downloads (12 Months): 1,863,   Downloads (Overall): 1,863

Full text available: PDFPDF
Deep Neural Networks (DNNs) have demonstrated state-of-the-art performance on a broad range of tasks involving natural language, speech, image, and video processing, and are deployed in many real world applications. However, DNNs impose significant computational challenges owing to the complexity of the networks and the amount of data they process, ...
Keywords: Deep Neural Networks, System Architecture, Hardware Accelerators
Also published in:
September 2017  ACM SIGARCH Computer Architecture News - ISCA'17: Volume 45 Issue 2, May 2017

4
November 2016 SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 3,   Downloads (12 Months): 31,   Downloads (Overall): 66

Full text available: PDFPDF
A b -M atching is a subset of edges M such that at most b ( v ) edges in M are incident on each vertex v , where b ( v ) is specified. We present a distributed-memory parallel algorithm, b -S uitor , that computes a b -M ...

5 published by ACM
April 2016 ACM Transactions on Computer Systems (TOCS): Volume 34 Issue 2, May 2016
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 27,   Downloads (12 Months): 310,   Downloads (Overall): 714

Full text available: PDFPDF
Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented data center infrastructure. Their performance and efficiency directly affect the QoS of web services and the efficiency of data centers. Traditionally, these systems have had significant overheads from inefficient network processing, OS ...
Keywords: many core, cloud and network, energy efficiency, Key-value stores, storage performance

6
February 2016 International Journal of High Performance Computing Applications: Volume 30 Issue 1, 2 2016
Publisher: Sage Publications, Inc.
Bibliometrics:
Citation Count: 0

This paper presents optimizations in a high-performance conjugate gradient benchmark HPCG for multi-core Intel ® Xeon ® processors and many-core Xeon Phi´┐Ż coprocessors. Without careful optimization, the HPCG benchmark under-utilizes the compute resources available in modern processors due to its low arithmetic intensity and challenges in parallelizing the Gauss-Seidel smoother ...
Keywords: Gauss-Seidel, directed acyclic graph, loop fusion, multi-grid, High-performance conjugate gradient, conjugate gradient, HPCG, Xeon Phi, task scheduling

7
February 2016 International Journal of High Performance Computing Applications: Volume 30 Issue 1, 2 2016
Publisher: Sage Publications, Inc.
Bibliometrics:
Citation Count: 1

This paper presents a new optimized and scalable code for Hartree-Fock self-consistent field iterations. Goals of the code design include scalability to large numbers of nodes, and the capability to simultaneously use CPUs and Intel Xeon Phi coprocessors. Issues we encountered as we optimized and scaled up the code on ...
Keywords: Hartree-Fock, Intel Xeon Phi, load balancing, Tianhe-2, quantum chemistry, work stealing

8 published by ACM
November 2015 SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 5,   Downloads (12 Months): 118,   Downloads (Overall): 343

Full text available: PDFPDF
Modern cosmology and plasma physics codes are now capable of simulating trillions of particles on petascale systems. Each timestep output from such simulations is on the order of 10s of TBs. Summarizing and analyzing raw particle data is challenging, and scientists often focus on density structures, whether in the real ...
Keywords: DBSCAN, KDTree, density-based clustering, parallel I/O

9 published by ACM
November 2015 SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 8,   Downloads (12 Months): 62,   Downloads (Overall): 266

Full text available: PDFPDF
Algebraic Multigrid (AMG) is a linear solver, well known for its linear computational complexity and excellent parallelization scalability. As a result, AMG is expected to be a solver of choice for emerging extreme scale systems capable of delivering hundred Pflops and beyond. While node level performance of AMG is generally ...

10 published by ACM
November 2015 IA3 '15: Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 1,   Downloads (12 Months): 29,   Downloads (Overall): 119

Full text available: PDFPDF
Graph partitioning is an important preprocessing step in applications dealing with sparse-irregular data. As such, the ability to efficiently partition a graph in parallel is crucial to the performance of these applications. The number of compute cores in a compute node continues to increase, demanding ever more scalability from shared-memory ...

11
July 2015 Proceedings of the VLDB Endowment: Volume 8 Issue 11, July 2015
Publisher: VLDB Endowment
Bibliometrics:
Citation Count: 16
Downloads (6 Weeks): 7,   Downloads (12 Months): 95,   Downloads (Overall): 228

Full text available: PDFPDF
Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly graph analytics framework and native, hand-optimized code. GraphMat functions by taking vertex programs and mapping ...

12 published by ACM
June 2015 ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
Publisher: ACM
Bibliometrics:
Citation Count: 19
Downloads (6 Weeks): 28,   Downloads (12 Months): 316,   Downloads (Overall): 1,478

Full text available: PDFPDF
Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented datacenter infrastructure. Their performance and efficiency directly affect the QoS of web services and the efficiency of datacenters. Traditionally, these systems have had significant overheads from inefficient network processing, OS kernel involvement, ...
Also published in:
January 2016  ACM SIGARCH Computer Architecture News - ISCA'15: Volume 43 Issue 3, June 2015

13
May 2015 IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working ...
Keywords: CFD, Krylov Solver, Multi-core, OpenMP+MPI

14 published by ACM
April 2015 Communications of the ACM: Volume 58 Issue 5, May 2015
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 10,   Downloads (12 Months): 158,   Downloads (Overall): 1,663

Full text available: HtmlHtml  PDFPDF  PDF Chinese translationPDF Chinese translation

15
November 2014 SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 6,   Downloads (12 Months): 39,   Downloads (Overall): 302

Full text available: PDFPDF
Dbscan is a widely used isodensity-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear ( O ( nlogn )) and computationally expensive for large datasets. Given the need for speed, we propose a fast heuristic algorithm ...
Keywords: density based clustering, union-find algorithm, approximate clustering algorithm, disjoint-set data structure

16
November 2014 SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 8
Downloads (6 Weeks): 2,   Downloads (12 Months): 45,   Downloads (Overall): 204

Full text available: PDFPDF
A new sparse high performance conjugate gradient benchmark (HPCG) has been recently released to address challenges in the design of sparse linear solvers for the next generation extreme-scale computing systems. Key computation, data access, and communication pattern in HPCG represent building blocks commonly found in today's HPC applications. While it ...

17
November 2014 SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 8
Downloads (6 Weeks): 4,   Downloads (12 Months): 42,   Downloads (Overall): 336

Full text available: PDFPDF
We present an end-to-end optimization of the innovative Arbitrary high-order DERivative Discontinuous Galerkin (ADER-DG) software SeisSol targeting Intel ® Xeon Phi ™ coprocessor platforms, achieving unprecedented earthquake model complexity through coupled simulation of full frictional sliding and seismic wave propagation. SeisSol exploits unstructured meshes to flexibly adapt for complicated geometries ...
Keywords: hybrid parallelization, ADER-DG, earthquake simulation, dynamic rupture, heterogeneous supercomputers, SeisSol, petascale performance

18
November 2014 SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher: IEEE Press
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 2,   Downloads (12 Months): 14,   Downloads (Overall): 197

Full text available: PDFPDF
The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice ...
Keywords: domain decomposition, Intel® Xeon Phi coprocessor, lattice QCD

19
June 2014 ISC 2014: Proceedings of the 29th International Conference on Supercomputing - Volume 8488
Publisher: Springer-Verlag New York, Inc.
Bibliometrics:
Citation Count: 6

The last decade has seen rapid growth of single-chip multiprocessors CMPs, which have been leveraging Moore's law to deliver high concurrency via increases in the number of cores and vector width. Modern CMPs execute from several hundreds to several thousands concurrent operations per second, while their memory subsystem delivers from ...

20 published by ACM
June 2014 SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
Publisher: ACM
Bibliometrics:
Citation Count: 42
Downloads (6 Weeks): 10,   Downloads (12 Months): 149,   Downloads (Overall): 765

Full text available: PDFPDF
Graph algorithms are becoming increasingly important for analyzing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed towards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. ...
Keywords: analysis, cluster, graph analytics, giraph, graphlab, Galois, combblas, frameworks, graphs, performance, socialite



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us