Author image not provided
 Harsha Vardhan Simhadri

 homepage
 harshasatlbl.gov

  Affiliation history
Bibliometrics: publication history
Average citations per article13.22
Citation Count119
Publication count9
Publication years2009-2016
Available for download9
Average downloads per article259.89
Downloads (cumulative)2,339
Downloads (12 Months)233
Downloads (6 Weeks)26
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


9 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 9 of 9
Sort by:

1 published by ACM
July 2016 SPAA '16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 13,   Downloads (12 Months): 71,   Downloads (Overall): 161

Full text available: PDFPDF
The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. "||" (parallel) and ";" (serial), that comprise the nested-parallel model are insufficient in expressing "partial dependencies" in a program. We propose a new dataflow composition construct "↝" to express partial dependencies ...
Keywords: dynamic programming, nested parallelism, numerical algorithms, cache-oblivious wavefront, space-bounded scheduler, parallel programming models, shared-memory multicore processors, cache-oblivious algorithms, data-flow, fork-join

2 published by ACM
June 2016 ACM Transactions on Parallel Computing (TOPC) - Special Issue for SPAA 2014: Volume 3 Issue 1, June 2016
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 3,   Downloads (12 Months): 43,   Downloads (Overall): 114

Full text available: PDFPDF
The running time of nested parallel programs on shared-memory machines depends in significant part on how well the scheduler mapping the program to the machine is optimized for the organization of caches and processor cores on the machine. Recent work proposed “space-bounded schedulers” for scheduling such programs on the multilevel ...
Keywords: cache misses, multicores, Thread schedulers, memory bandwidth, work stealing, space-bounded schedulers

3 published by ACM
June 2014 SPAA '14: Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 3,   Downloads (12 Months): 13,   Downloads (Overall): 114

Full text available: PDFPDF
The running time of nested parallel programs on shared memory machines depends in significant part on how well the scheduler mapping the program to the machine is optimized for the organization of caches and processors on the machine. Recent work proposed ``space-bounded schedulers'' for scheduling such programs on the multi-level ...
Keywords: memory bandwidth, thread schedulers, work stealing, space-bounded schedulers, cache misses, multicores

4 published by ACM
June 2013 MSPC '13: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 1,   Downloads (12 Months): 4,   Downloads (Overall): 35

Full text available: PDFPDF
In this position paper, we argue that cost models for locality in parallel machines should be program-centric , not machine-centric.
Keywords: locality, parallelism, program-centric models

5 published by ACM
June 2012 SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Publisher: ACM
Bibliometrics:
Citation Count: 8
Downloads (6 Weeks): 0,   Downloads (12 Months): 9,   Downloads (Overall): 279

Full text available: PDFPDF
This paper presents the design, analysis, and implementation of parallel and sequential I/O-efficient algorithms for set cover, tying together the line of work on parallel set cover and the line of work on efficient set cover algorithms for large, disk-resident instances. Our contributions are twofold: First, we design and analyze ...
Keywords: approximation algorithms, external memory algorithms, max k-cover, parallel algorithms, set cover

6 published by ACM
June 2012 SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Publisher: ACM
Bibliometrics:
Citation Count: 50
Downloads (6 Weeks): 4,   Downloads (12 Months): 38,   Downloads (Overall): 322

Full text available: PDFPDF
This announcement describes the problem based benchmark suite (PBBS). PBBS is a set of benchmarks designed for comparing parallel algorithmic approaches, parallel programming language styles, and machine architectures across a broad set of problems. Each benchmark is defined concretely in terms of a problem specification and a set of input ...
Keywords: algorithm performance, benchmarking, parallel algorithms

7 published by ACM
June 2011 SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Publisher: ACM
Bibliometrics:
Citation Count: 19
Downloads (6 Weeks): 2,   Downloads (12 Months): 10,   Downloads (Overall): 442

Full text available: PDFPDF
For nested-parallel computations with low depth (span, critical path length) analyzing the work, depth, and sequential cache complexity suffices to attain reasonably strong bounds on the parallel runtime and cache complexity on machine models with either shared or private caches. These bounds, however, do not extend to general hierarchical caches, ...
Keywords: analysis of parallel algorithms, cache complexity, cost models, parallel hierarchical memory, schedulers

8 published by ACM
June 2010 SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Publisher: ACM
Bibliometrics:
Citation Count: 29
Downloads (6 Weeks): 2,   Downloads (12 Months): 40,   Downloads (Overall): 648

Full text available: PDFPDF
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches. The approach is to design nested-parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation ...
Keywords: cache-oblivious algorithms, graph algorithms, multiprocessors, parallel algorithms, schedulers, sorting, sparse-matrix vector multiply

9 published by ACM
August 2009 SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 1,   Downloads (12 Months): 6,   Downloads (Overall): 202

Full text available: PDFPDF
Cache-oblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of a multi-level cache hierarchy, regardless of the specifics (cache size and cache line size) of each level. In this paper, we describe cache-oblivious sorting algorithms with optimal work, optimal cache complexity and polylogarithmic depth. Using ...
Keywords: cache-oblivious algorithms, merging, multiprocessors, parallel algorithms, schedulers, sorting



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us