Author image not provided
 Qing Yi

 homepage
 qingyiatacm.org

  Affiliation history
Bibliometrics: publication history
Average citations per article5.18
Citation Count176
Publication count34
Publication years1998-2016
Available for download18
Average downloads per article191.50
Downloads (cumulative)3,447
Downloads (12 Months)446
Downloads (6 Weeks)25
Professional ACM Member
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


34 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 20 of 34
Result page: 1 2

Sort by:

1 published by ACM
September 2016 PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 3,   Downloads (12 Months): 130,   Downloads (Overall): 212

Full text available: PDFPDF
As virtualization becomes ubiquitous in datacenters, there is a growing interest in characterizing application performance in multi-tenant environments to improve datacenter resource management. The performance of parallel programs is notoriously difficult to reason about in virtualized environments. Although performance degradations caused by virtualization and interferences have been extensively studied, there ...
Keywords: parallel performance modeling and optimization, virtual machine scheduling, multicore systems

2
September 2015 LCPC 2015: Revised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing - Volume 9519
Publisher: Springer-Verlag New York, Inc.
Bibliometrics:
Citation Count: 0

Conventional compilers provide limited external control over the optimizations they automatically apply to attain high performance. Consequently, these optimizations have become increasingly ineffective due to the difficulty of understanding the higher-level semantics of the user applications. This paper presents a framework that provides interactive fine-grained control of compiler optimizations to ...

3 published by ACM
June 2015 SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 5,   Downloads (12 Months): 34,   Downloads (Overall): 185

Full text available: PDFPDF
The performance of parallel programs is notoriously difficult to reason in virtualized environments. Although performance degradations caused by virtualization and interferences have been well studied, there is little understanding why different parallel programs have unpredictable slow- downs. We find that unpredictable performance is the result of complex interplays between the ...
Keywords: scheduling, cloud computing, parallel computing
Also published in:
June 2015  ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review: Volume 43 Issue 1, June 2015

4
May 2015 WWW '15: Proceedings of the 24th International Conference on World Wide Web
Publisher: International World Wide Web Conferences Steering Committee
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 2,   Downloads (12 Months): 44,   Downloads (Overall): 202

Full text available: PDFPDF
A large number of extensions exist in browser vendors' online stores for millions of users to download and use. Many of those extensions process sensitive information from user inputs and webpages; however, it remains a big question whether those extensions may accidentally leak such sensitive information out of the browsers ...
Keywords: javascript, vulnerability analysis, web browser extension

5 published by ACM
May 2015 CF '15: Proceedings of the 12th ACM International Conference on Computing Frontiers
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1,   Downloads (12 Months): 13,   Downloads (Overall): 88

Full text available: PDFPDF
As computer systems increasingly focus on balancing the performance and power efficiency of software applications together with temperature variations of the machine, they need to understand how software applications utilize the various architecture components differently. This paper develops a power and temperature modeling framework to provide such timely feedback, which ...
Keywords: machine learning, application categorization

6
December 2014 MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 1,   Downloads (12 Months): 9,   Downloads (Overall): 80

Full text available: PDFPDF
General purpose compilers aim to extract the best average performance for all possible user applications. Due to the lack of specializations for different types of computations, compiler attained performance often lags behind those of the manually optimized libraries. In this paper, we demonstrate a new approach, programmable composition, to enable ...
Keywords: Programming, Automatic programming, Computers and information processing, Computer science

7 published by ACM
December 2014 ACM SIGMETRICS Performance Evaluation Review: Volume 42 Issue 3, December 2014
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 0,   Downloads (12 Months): 24,   Downloads (Overall): 106

Full text available: PDFPDF
Many data centers are built using a fat-tree network topology because of its high bisection bandwidth. Therefore thereis a need to develop analytical models for the energy behavior of fat-tree networks and examine strategies to reduce energy consumption. The most effective strategy is to power off entire switches, if possible. ...
Keywords: analysis, Data center, fat-tree, simulation

8
May 2014 IPDPS '14: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

Software-hardware co-design has become increasingly important as the scale and complexity of both are reaching an unprecedented level. To predict and understand application behavior on emerging or conceptual systems, existing research has mostly relied on cycle-accurate micro-architecture simulators, which are known to be time-consuming and are oblivious to workloads' control ...

9 published by ACM
November 2013 SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Publisher: ACM
Bibliometrics:
Citation Count: 14
Downloads (6 Weeks): 6,   Downloads (12 Months): 45,   Downloads (Overall): 296

Full text available: PDFPDF
Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In this paper, we present a template-based optimization framework, AUGEM, which can automatically generate fully optimized assembly code for several dense linear algebra (DLA) kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring ...
Keywords: DLA code optimization, auto-tuning, code generation

10 published by ACM
October 2013 SPLASH '13: Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanity
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 1,   Downloads (12 Months): 24,   Downloads (Overall): 107

Full text available: PDFPDF
Efficient multicore programming demands fundamental data structures that support a high degree of concurrency. Existing research on non-blocking data structures promises to satisfy such demands by providing progress guarantees that allow a significant increase in parallelism while avoiding the safety hazards of lock-based synchronizations. It is well-acknowledged that the use ...
Keywords: C/C++ multithreading, multiprocessor software design, parallel data structures, concurrent data deduplication, lock-free synchronization

11
October 2013 PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Publisher: IEEE Press
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 0,   Downloads (12 Months): 19,   Downloads (Overall): 148

Full text available: PDFPDF
Modern architectures increasingly rely on SIMD vectorization to improve performance for floating point intensive scientific applications. However, existing compiler optimization techniques for automatic vectorization are inhibited by the presence of unknown control flow surrounding partially vectorizable computations. In this paper, we present a new approach, speculative vectorization, which speculates past ...
Keywords: atlas, iterative compilation, IFKO, SIMD vectorization, speculation, compiler optimization

12
October 2013 ICPP '13: Proceedings of the 2013 42nd International Conference on Parallel Processing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1

MPI is the de facto standard for portable parallel programming on high-end systems. However, while the MPI standard provides functional portability, it does not provide sufficient performance portability across platforms. We present a framework that enables users to provide hints about communication patterns used within MPI applications. These annotations are ...
Keywords: high performance computing, parallel programming, automatic programming

13 published by ACM
January 2013 ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers: Volume 9 Issue 4, January 2013
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 1,   Downloads (12 Months): 14,   Downloads (Overall): 274

Full text available: PDFPDF
Most scientific computations serve to apply mathematical operations to a set of preconceived data structures, e.g., matrices, vectors, and grids. In this article, we use a number of widely used matrix computations from the LINPACK library to demonstrate that complex internal organizations of data structures can severely degrade the effectiveness ...
Keywords: optimization, Compiler, matrix computation, layout, pattern

14
June 2012 Software—Practice & Experience: Volume 42 Issue 6, June 2012
Publisher: John Wiley & Sons, Inc.
Bibliometrics:
Citation Count: 12

We present POET, a scripting language designed for applying advanced program transformations to code in arbitrary programming languages as well as building ad hoc translators between these languages. We have used POET to support a large number of compiler optimizations, including loop interchange, parallelization, blocking, fusion/fission, strength reduction, scalar replacement, ...
Keywords: compiler optimization, source-to-source translators, transformation language

15
May 2012 IPDPS '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 3

Reuse distance analysis is a runtime approach that has been widely used to accurately model the memory system behavior of applications. However, traditional reuse distance analysis algorithms use tree-based data structures and are hard to parallelize, missing the tremendous computing power of modern architectures such as the emerging GPUs. This ...
Keywords: Reuse distance, GPU acceleration, hash table

16 published by ACM
May 2012 CF '12: Proceedings of the 9th conference on Computing Frontiers
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 0,   Downloads (12 Months): 5,   Downloads (Overall): 191

Full text available: PDFPDF
This paper studies the overall system power variations of two multi-core architectures, an 8-core Intel and a 32-core AMD workstation, while using these machines to execute a wide variety of sequential and multi-threaded benchmarks using varying compiler optimization settings and runtime configurations. Our extensive experimental study provides insights for answering ...
Keywords: power consumption, application level optimization, energy efficiency, compiler optimization

17
September 2011 ICPP '11: Proceedings of the 2011 International Conference on Parallel Processing
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 3

The emergence of multi-core architectures makes it essential for optimizing compilers to automatically extract parallelism for large scienti�"¨�…c applications composed of many subroutines residing in different �"¨�…les. In lining is a well-known technique which can be used to erase procedural boundaries and enable more aggressive loop parallelization. However, conventional in ...
Keywords: parallelization, inlining, annotation, tuning

18 published by ACM
May 2011 CF '11: Proceedings of the 8th ACM International Conference on Computing Frontiers
Publisher: ACM
Bibliometrics:
Citation Count: 9
Downloads (6 Weeks): 3,   Downloads (12 Months): 38,   Downloads (Overall): 216

Full text available: PDFPDF
Stencil computations are the foundation of many large applications in scientific computing. Previous research has shown that several optimization mechanisms, including rectangular blocking and time skewing combined with wavefront- and pipeline-based parallelization, can be used to significantly improve the performance of stencil kernels on multi-core architectures. However, the overall performance ...

19
April 2011 CGO '11: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 9
Downloads (6 Weeks): 2,   Downloads (12 Months): 7,   Downloads (Overall): 64

Full text available: PDFPDF
We present a framework which effectively combines programmable control by developers, advanced optimization by compilers, and flexible parameterization of optimizations to achieve portable high performance. We have extended ROSE, a C/C++/Fortran source-to-source optimizing compiler, to automatically analyze scientific applications and discover optimization opportunities. Instead of directly generating optimized code, our ...

20 published by ACM
January 2011 HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Publisher: ACM
Bibliometrics:
Citation Count: 11
Downloads (6 Weeks): 0,   Downloads (12 Months): 5,   Downloads (Overall): 165

Full text available: PDFPDF
Automatic empirical tuning of compiler optimizations has been widely used to achieve portable high performance for scientific applications. However, as power dissipation becomes increasingly important in modern architecture design, few have attempted to empirically tune optimization configurations to reduce the power consumption of applications. We provide an automated empirical tuning ...
Keywords: compiler optimizations, empirical tuning, performance, power consumption



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us