Abstract
Streaming applications are promising targets for effectively utilizing multicores because of their inherent amenability to pipelined parallelism. While existing methods of orchestrating streaming programs on multicores have mostly been static, real-world applications show ample variations in execution time that may cause the achieved speedup and throughput to be sub-optimal. One of the principle challenges for moving towards dynamic orchestration has been the lack of approaches that can predict or accurately estimate upcoming dynamic variations in execution efficiently, well before they occur.
In this paper, we propose an automated dynamic execution behavior prediction approach that can be used to efficiently estimate the time that will be spent in different pipeline stages for upcoming inputs without requiring program execution. This enables dynamic balancing or scheduling of execution to achieve better speedup. Our approach first uses dynamic taint analysis to automatically generates an input-based execution characterization of the streaming program, which identifies the key control points where variation in execution might occur with the associated input elements that cause these variations.We then automatically generate a light-weight emulator from the program using this characterization that can simulate the execution paths taken for new streaming inputs and provide an estimate of execution time that will be spent in processing these inputs, enabling prediction of possible dynamic variations. We present experimental evidence that our technique can accurately and efficiently estimate execution behaviors for several benchmarks. Our experiments show that dynamic orchestration using our predicted execution behavior can achieve considerably higher speedup than static orchestration.
- A. Aho, M. Lam, R. Sethi, and J. Ullman. Compilers-Principles, Techniques, & Tools. Addison Wesley, 2006. Google Scholar
Digital Library
- Buck et al. Brook for gpus: Stream computing on graphics hardware. In ACM Trans. on Graphics, 23(3):777--786, 2004. Google Scholar
Digital Library
- W. Cui, M. Peinado, K. Chen, H. Wang, and L. Irun-Briz. Tupni: Automatic reverse engineering of input formats. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2008. Google Scholar
Digital Library
- A. Douillet and G.R. Gao. Software-pipelining on multi-core architectures. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2007. Google Scholar
Digital Library
- J. Ferrante, K. Ottenstein, and J.D. Warren. The program dependence graph and its use in optimization. ACM Trans. on Programming Languages and Systems, 9(3), July 1987. Google Scholar
Digital Library
- A.H. Ghamarian, M.C.W. Geilen, T. Basten, and S. Stuijk. Parametric throughput analysis of synchronous data flow graphs. In Proceedings of the conference on Design, automation and test in Europe, 2008. Google Scholar
Digital Library
- M.I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In the Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google Scholar
Digital Library
- J. Gummaraju and M. Rosenblum. Stream programming on generalpurpose processors. In Proceedings of the International Symposium on Microarchitecture, 2005. Google Scholar
Digital Library
- H. Kim, M.A. Suleman, O. Mutlu, and Y.N. Patt. 2d-profling: Detecting input-dependent branches with a single input data set. In Proceedings of the International Symposium of Code Generation and Optimization, 2006. Google Scholar
Digital Library
- M. kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI), 2008. Google Scholar
Digital Library
- T. Kumar, R. Cledat, J. Sreeram, and S. Pande. Statistically analyzing execution variance for soft real-time applications. In In Proc. of the 21th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC), 2008. Google Scholar
Digital Library
- W.R. Mark., R.S. Glanville, K. Akeley, and M.J. Kilgard. Cg: A system for programming graphics hardware in a c-like language. In Proceedings of the Internationl Conference on Computer Graphics and Interactive Techniques, 2003. Google Scholar
Digital Library
- J. Newsome and D. Song. Dynamic taint analysis for automatic detection, analysis and signature generation of exploits on commodity software. In Proceedings of the Annual Network and Distributed System Security Symposium, 2005.Google Scholar
- J. Nickolls and I. Buck. Nvidia cuda software and gpu parallel computing architecture. In Microprocessor Forum, 2007.Google Scholar
- G. Ottoni, R. Rangan, A. Stoler, and D.I. August. Automatic thread extraction with decoupled software pipelining. In Proceedings of the International Symposium on Microarchitecture, 2005. Google Scholar
Digital Library
- P. Poplavko, T. Basten, and J. van Meerbergen. Execution-time prediction for dynamic streaming applications with task-level parallelism. In Proceedings of the Conference on Digital System Design Architectures, Methods and Tools, 2007. Google Scholar
Digital Library
- R. Rangan, N. Vachharajani, M. Vachharajani, and D. August. Decoupled software pipelining with the synchronization array. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2004. Google Scholar
Digital Library
- B.R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the International Symposium on Microarchitecture, 1994. Google Scholar
Digital Library
- B.D. Theelen, M.C.W. Geilen, T. Basten, J.M. Voeten, S.V. Gheorghita, and S. Stuijk. A scenario-aware data flow model for combined long-run average and worst-case performance analysis. In Proceedings of the International Conference on Formal Methods and Models for Co-Design, 2006.Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. Amarasinghe. Streamit: A language for streaming applications. In Proceedings of the Internationl Symposium on Compiler Construction, 2002. Google Scholar
Digital Library
- W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in c programs. In Proceedings of the International Symposium on Microarchitecture, 2007. Google Scholar
Digital Library
Index Terms
Input-driven dynamic execution prediction of streaming applications
Recommendations
Input-driven dynamic execution prediction of streaming applications
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingStreaming applications are promising targets for effectively utilizing multicores because of their inherent amenability to pipelined parallelism. While existing methods of orchestrating streaming programs on multicores have mostly been static, real-...
Dynamic Instrumentation and Performance Prediction of Application Execution
HPCN Europe 2001: Proceedings of the 9th International Conference on High-Performance Computing and NetworkingThis paper presents a new technique that enhances the process and the methodology used in a performance prediction analysis. An automatic dynamic instrumentation methodology is added to Warwick's Performance Analysis and Characterization Environment ...
Robust Plan Execution in Multi-agent Environments
ICTAI '14: Proceedings of the 2014 IEEE 26th International Conference on Tools with Artificial IntelligenceThis paper presents a novel multi-agent reactive execution model that keeps track of the execution of an agent to recover from incoming failures. It is a domain-independent execution model, which can be exploited in any planning control application, ...







Comments