Author image not provided
 Amir Kavyan Ziabari
 aziabariatece.neu.edu

  Affiliation history
Bibliometrics: publication history
Average citations per article1.22
Citation Count11
Publication count9
Publication years2014-2017
Available for download8
Average downloads per article99.75
Downloads (cumulative)798
Downloads (12 Months)515
Downloads (6 Weeks)47
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author




BOOKMARK & SHARE


9 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 9 of 9
Sort by:

1 published by ACM
October 2017 MEMSYS '17: Proceedings of the International Symposium on Memory Systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 9,   Downloads (12 Months): 30,   Downloads (Overall): 30

Full text available: PDFPDF
Even given PCM's attractive features that include high scalability and lower power, write endurance remains a critical issue that impedes the move for this technology to replace DRAM in main memory systems. The wear-out problem is further exacerbated by advances in future technologies, where cell sizes are reduced and process ...
Keywords: error correction, phase change memory, endurance, fault detection, lifetime, non-volatile memory

2
March 2017 DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe
Publisher: European Design and Automation Association
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 0,   Downloads (12 Months): 1,   Downloads (Overall): 1

Full text available: PDFPDF
Block-level cooperation is an endurance management technique that operates on top of error correction mechanisms to extend memory lifetimes. Once an error recovery scheme fails to recover from faults in a data block, the entire physical page associated with that block is disabled and becomes unavailable to the physical address ...

3
February 2017 CGO '17: Proceedings of the 2017 International Symposium on Code Generation and Optimization
Publisher: IEEE Press
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 11,   Downloads (12 Months): 161,   Downloads (Overall): 161

Full text available: PDFPDF
As throughput-oriented accelerators, GPUs provide tremendous processing power by running a massive number of threads in parallel. However, exploiting high degrees of thread-level parallelism (TLP) does not always translate to the peak performance that GPUs can offer, leaving the GPU’s resources often under-utilized. Compared to compute resources, memory resources can ...

4 published by ACM
December 2016 ACM Transactions on Architecture and Code Optimization (TACO): Volume 13 Issue 4, December 2016
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 9,   Downloads (12 Months): 181,   Downloads (Overall): 213

Full text available: PDFPDF
In this article, we describe how to ease memory management between a Central Processing Unit (CPU) and one or multiple discrete Graphic Processing Units (GPUs) by architecting a novel hardware-based Unified Memory Hierarchy (UMH). Adopting UMH, a GPU accesses the CPU memory only if it does not find its required ...
Keywords: Unified memory architecture, graphics processing units, memory hierarchy, high performance computing

5 published by ACM
September 2015 NOCS '15: Proceedings of the 9th International Symposium on Networks-on-Chip
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 5,   Downloads (12 Months): 63,   Downloads (Overall): 135

Full text available: PDFPDF
While both Chip MultiProcessors (CMPs) and Graphics Processing Units (GPUs) are many-core systems, they exhibit different memory access patterns. CMPs execute threads in parallel, where threads communicate and synchronize through the memory hierarchy (without any coalescing). GPUs on the other hand execute a large number of independent thread blocks and ...

6 published by ACM
June 2015 WCAE '15: Proceedings of the Workshop on Computer Architecture Education
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 3,   Downloads (12 Months): 17,   Downloads (Overall): 32

Full text available: PDFPDF
Evaluating the performance of parallel and heterogeneous programs and architectures can be challenging. An emulator or simulator can be used to aid the programmer. To provide guidance and feedback to the programmer, the simulator needs to present traces, reports, and debugging information in a coherent and unambiguous format. Although these ...
Keywords: trace-driven visualization, heterogeneous systems, cycle-based simulation

7 published by ACM
June 2015 ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 5,   Downloads (12 Months): 34,   Downloads (Overall): 125

Full text available: PDFPDF
Silicon-photonic link technology promises to satisfy the growing need for high bandwidth, low-latency and energy-efficient network-on-chip (NoC) architectures. While silicon-photonic NoC designs have been extensively studied for future many-core systems, their use in massively-threaded GPUs has received little attention to date. In this paper, we first analyze an electrical NoC ...
Keywords: gpus, network-on-chip, photonics technology

8 published by ACM
May 2015 IWOCL '15: Proceedings of the 3rd International Workshop on OpenCL
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 5,   Downloads (12 Months): 28,   Downloads (Overall): 101

Full text available: PDFPDF  PDFPDF
Evaluating parallel and heterogeneous programs written in OpenCL can be challenging. Commonly, simulators can be used to aid the programmer in this regard. One of the fundamental requirements of any simulator is to provide traces, reports, and debugging information in a coherent and unambiguous format. Although these traces or reports ...

9
August 2014 International Journal of High Performance Computing Applications: Volume 28 Issue 3, August 2014
Publisher: Sage Publications, Inc.
Bibliometrics:
Citation Count: 1

Graphics processing units (GPUs) have become widely accepted as the computing platform of choice in many high performance computing domains. The availability of programming standards such as OpenCL are used to leverage the inherent parallelism offered by GPUs. Source code optimizations such as loop unrolling and tiling when targeted to ...
Keywords: GPUs, fast Fourier Transform, OpenCL, optimizations, power, system-on-chip



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us