Abstract
The data-triggered threads (DTT) programming and execution model can increase parallelism and eliminate redundant computation. However, the initial proposal requires significant architecture support, which impedes existing applications and architectures from taking advantage of this model. This work proposes a pure software solution that supports the DTT model without any hardware support. This research uses a prototype compiler and runtime libraries running on top of existing machines. Several enhancements to the initial software implementation are presented, which further improve the performance.
The software runtime system improves the performance of serial C SPEC benchmarks by 15% on a Nehalem processor, but by over 7X over the full suite of single-thread applications. It is shown that the DTT model can work in conjunction with traditional parallelism. The DTT model provides up to 64X speedup over parallel applications exploiting traditional parallelism.
References
- Arvind and R. S. Nikhil. Executing a program on the MIT tagged-token dataflow architecture. IEEE Transactions on Computers, 39:300?--318, March 1990. Google Scholar
Digital Library
- S. Balakrishnan and G. S. Sohi. Program demultiplexing: Data-flow based speculative parallelization of methods in sequential programs. In 33rd Annual International Symposium on Computer Architecture, volume 0, pages 302--?313, June 2006. Google Scholar
Digital Library
- C. Bienia and K. Li. P ARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, June 2009.Google Scholar
- J. Brown, L. Porter, and D. Tullsen. Fast thread migration via cache working set prediction. In Proceedings of 17th International Symposium on High P erformance Computer Architecture, pages 193--204, February 2011. Google Scholar
Digital Library
- D. C. Cann, J. T . Feo, A. D. W. Bohoem, and O. Oldehoeft. SISAL Reference Manual: Language Version 2.0, 1992.Google Scholar
- D. Citron, D. Feitelson, and L. Rudolph. Accelerating multimedia processing by implementing memoing in multiplication and division units. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252--?261, October 1998. Google Scholar
Digital Library
- D. E. Culler, A. Sah, K. E. Schauser, T . von Eicken, and J. Wawrzynek. Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 164?--175, April 1991. Google Scholar
Digital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implemen-tation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 conference on Programming lan-guage design and implementation, pages 212--?223, 1998. Google Scholar
Digital Library
- M. A. Hammer, U. A. Acar, and Y . Chen. CEAL: a C-based language for self-adjusting computation. In Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, pages 25?--37, 2009. Google Scholar
Digital Library
- P . Marcuello, A. Gonzalez, and J. Tubella. Speculative multi-threaded processors. In Proceedings of the 12th international conference on Supercomputing, pages 77--84, 1998. Google Scholar
Digital Library
- D. Michie. Memo functions and machine learning. Nature, 218:19?--22, 1968.Google Scholar
Cross Ref
- R. S. Nikhil. Can dataflow subsume von Neumann computing? In 16th Annual International Symposium on Computer Architecture, pages 262--?272, May 1989. Google Scholar
Digital Library
- R. S. Nikhil. ID reference manual, version 90.1. CSG Memo 284-2, September 1990.Google Scholar
- G. Papadopoulos and D. Culler. Monsoon: an explicit token-store architecture. 17th Annual International Symposium on Computer Architecture, pages 82?--91, May 1990. Google Scholar
Digital Library
- H. Patil, R. Cohn, M. Charney , R. Kapoor, A. Sun, and A. Karunanidhi. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In 34th International Symposium on Microarchitecture, pages 81--92, Dec. 2004. Google Scholar
Digital Library
- H.-W . Tseng and D. M. Tullsen. Data-triggered threads: Elim-inating redundant computation. In Proceedings of 17th International Symposium on High Performance Computer Architecture, pages 181--?192, Feb. 2011. Google Scholar
Digital Library
- H.-W . Tseng and D. M. Tullsen. Eliminating redundant computation and exposing parallelism through data-triggered threads. IEEE Micro, 32(3):38--?47, May 2012. Google Scholar
Digital Library
- D. Tullsen, S. Eggers, and H. Levy . Simultaneous multithreading: Maximizing on-chip parallelism. In 22nd Annual International Symposium on Computer Architecture, pages 392?--403, June 1995. Google Scholar
Digital Library
Index Terms
Software data-triggered threads






Comments