Author image not provided
 Yaohua Wang

Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article1.00
Citation Count6
Publication count6
Publication years2010-2016
Available for download1
Average downloads per article146.00
Downloads (cumulative)146
Downloads (12 Months)50
Downloads (6 Weeks)15
Arrow RightAuthor only

See all colleagues of this author


6 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 6 of 6
Sort by:

1 published by ACM
January 2016 ACM Transactions on Architecture and Code Optimization (TACO): Volume 12 Issue 4, January 2016
Publisher: ACM
Citation Count: 0
Downloads (6 Weeks): 15,   Downloads (12 Months): 50,   Downloads (Overall): 146

Full text available: PDFPDF
The efficacy of single instruction, multiple data (SIMD) architectures is limited when handling divergent control flows. This circumstance results in SIMD fragments using only a subset of the available lanes. We propose an iteration interleaving--based SIMD lane partition (IISLP) architecture that interleaves the execution of consecutive iterations and dynamically partitions ...
Keywords: SIMD, iteration interleaving, vector iteration, SIMD lane partition, instruction shuffle

June 2012 HPCC '12: Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems
Publisher: IEEE Computer Society
Citation Count: 0

To further improve the performance of SIMD (Single Instruction Multiple Data) architectures, which are widely used in the wireless communication domain. The main components of Long Term Evolution (LTE) protocol are analyzed. Performance investigation is taken on a cycle-accurate simulator, featuring the main characteristics of existing SIMD architectures. Based on ...
Keywords: SIMD, LTE, MRF, Shuffle

May 2012 IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Publisher: IEEE Computer Society
Citation Count: 0

Hybrid architectures combined of VLIW, SIMD and multi-core schemes are increasingly prevailing in media processors, due to the abundant parallelism existed in media applications. However, parameters for current combinations such as the VLIW length, SIMD width and core count are set mainly according to simple profiling or the designer's experience ...
Keywords: VLIW, SIMD, Multi-core, Analytical Model

July 2011 NAS '11: Proceedings of the 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage
Publisher: IEEE Computer Society
Citation Count: 0

The shuffle operation is one of the bottlenecks invector DSPs. The partitioning problem of the shuffle matrix will have a great effect on the design of the shuffle unit, when dealing with the small grain data shuffle using a smaller-sized crossbar. The traditional matrix block partitioning solution will bring much ...

July 2011 ISVLSI '11: Proceedings of the 2011 IEEE Computer Society Annual Symposium on VLSI
Publisher: IEEE Computer Society
Citation Count: 0

Stream processor is efficient for media applications as it exploits the features of media processing, such as data parallelism, producer-consumer locality and so on. However, the loosely coupled structure between host and stream processor makes the communication between scalar and SIMD part costly and scheduling across kernels less flexible. Besides, ...
Keywords: Stream Processor, Stream Length Effect, Enhanced Scalar Processor, Kernel Overlapping

September 2010 HPCC '10: Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
Publisher: IEEE Computer Society
Citation Count: 6

The emergence of large-scale chip multicore processors makes the on-chip parallel H.264/AVC encoder with high parallelism feasible. To reduce the data reload frequency, a hierarchical chip multi-core DSP platform with overall 64 DSP cores is designed to accommodate the computation/data-intensive H.264/AVC encoder. To increase parallelism, macro block level parallelism is ...

The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us