Abstract

Synchronization and data movement are the key impediments to an efficient parallel execution. To ensure that data shared by multiple threads remain consistent, the programmer must use synchronization (e.g., mutex locks) to serialize threads' accesses to data. This limits parallelism because it forces threads to sequentially access shared resources. Additionally, systems use cache coherence to ensure that processors always operate on the most up-to-date version of a value even in the presence of private caches. Coherence protocol implementations cause processors to serialize their accesses to shared data, further limiting parallelism and performance.
- C. Blundell, A. Raghavan, and M. M. Martin. Retcon: Transactional repair without replay. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA '10, pages 258--269, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- S. Burckhardt, D. Leijen, C. Sadowski, J. Yi, and T. Ball. Two for the price of one: A model for parallel and incremental computation. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '11, pages 427--444, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- M. C. Rinard and P. C. Diniz. Eliminating synchronization bottlenecks using adaptive replication. ACM Transactions on Programming Languages and Systems (TOPLAS), 25(3):316--359, 2003. Google Scholar
Digital Library
- P. Tu and D. A. Padua. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 500--521, London, UK, UK, 1994. Springer-Verlag. Google Scholar
Cross Ref
- H. Yu, H.-J. Ko, and Z. Li. General data structure expansion for multi-threading. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 243--252, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- G. Zhang, W. Horn, and D. Sanchez. Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems. In Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48, pages 13--25, New York, NY, USA, 2015. ACM. Google Scholar
Digital Library
Index Terms
POSTER: An Architecture and Programming Model for Accelerating Parallel Commutative Computations via Privatization
Recommendations
POSTER: An Architecture and Programming Model for Accelerating Parallel Commutative Computations via Privatization
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingSynchronization and data movement are the key impediments to an efficient parallel execution. To ensure that data shared by multiple threads remain consistent, the programmer must use synchronization (e.g., mutex locks) to serialize threads' accesses to ...
POSTER: HythTM: Extending the Applicability of Intel TSX Hardware Transactional Support
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingIn this work, we introduce and experimentally evaluate a new hybrid software-hardware Transactional Memory prototype based on Intel's Haswell TSX architecture. Our prototype extends the applicability of the existing hardware support for TM by ...
POSTER: HythTM: Extending the Applicability of Intel TSX Hardware Transactional Support
PPoPP '17In this work, we introduce and experimentally evaluate a new hybrid software-hardware Transactional Memory prototype based on Intel's Haswell TSX architecture. Our prototype extends the applicability of the existing hardware support for TM by ...







Comments