ABSTRACT
This paper presents ParaMeter, an interactive program analysis and visualization system for large traces. Using ParaMeter, a software developer can locate and analyze regions of code that may yield to parallelization efforts and to possibly extract performance from multicore hardware. The key contributions in the paper are (1) a method to use interactive visualization of traces to find and exploit parallelism, (2) interactive-speed visualization of large-scale trace dependencies, (3) interactive-speed visualization of code interactions, and (4) a BDD variable ordering for DD-compressed traces that results in fast visualization, fast analysis, and good compression. ParaMeter's effectiveness is demonstrated by finding and exploiting parallelism in 175.vpr. Measurements of ParaMeter's visualization algorithms show that they are up to seventy-five thousand times faster than prior approaches.
- AMARASINGHE, S. Multicores from the compiler's perspective: A blessing or a curse? In CGO '05: Proceedings of the International Symposium on Code Generation and Optimization (March 2005), pp. 137--137. Google Scholar
Digital Library
- AUSTIN, T. M., PNEVMATIKATOS, D. N., AND SOHI, G. S. Dynamic dependency analysis of ordinary programs. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA) (May 1992). Google Scholar
Digital Library
- BRAYTON, R. K., HACHTEL, G. D., SANGIOVANNI-VINCENTELLI, A., SOMENZI, F., AZIZ, A., CHENG, S.-T., EDWARDS, S., KHATRI, S., KUKIMOTO, Y., PARDO, A., QADEER, S., RANJAN, R. K., SARWARY, S., SHIPLE, T. R., SWAMY, G., AND VILLA, T. Vis: A system for verification and synthesis. In Eigth Conference on Computer Aided Verification (CAV'96), T. Henzinger and R. Alur, Eds. Springer-Verlag, Rutgers University, 1996, pp. 428--432. LNCS 1102. Google Scholar
Digital Library
- BRYANT, R. E. Graph-based algorithms for Boolean function manipulation. IEEE Transaction on Computers C-35, 8 (August 1986), 677--691. Google Scholar
Digital Library
- BURTSCHER, M., AND SAM, N. Automatic generation of high-performance trace compressors. In Proceedings of the 2005 International Conference on Code Generation and Optimization (2005). Google Scholar
Digital Library
- GIACOMONI, J., MOSELEY, T., AND VACHHARAJANI, M. FastForward for efficient pipeline parallelism: A cache-optimized concurrent lock-free queue. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2008), pp. 43--52. Google Scholar
Digital Library
- III, J. H., AND BRGLEZ, F. Design of experiments in BDD variable ordering:lessons learned. In Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (1998), pp. 646--652. Google Scholar
Digital Library
- IYER, M., ASHOK, C., STONE, J., VACHHARAJANI, N., CONNORS, D. A., AND VACHHARAJANI, M. Finding parallelism for future EPIC machines. In Proceedings of the 4th Workshop on Explicitly Parallel Instruction Computing Techniques (EPIC) (March 2005).Google Scholar
- JEONG, S.-W., PLESSIER, B., HACHTEL, G., AND SOMENZI, F. Variable ordering for binary decision diagrams. In Proceedings of the 3rd European Design Automation Conference (1992), pp. 447--451.Google Scholar
Cross Ref
- LAM, M. S., AND WILSON, R. P. Limits of control flow on parallelism. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA) (May 1992), pp. 46--57. Google Scholar
Digital Library
- LARUS, J. R. Whole program paths. In Proceedings of the SIGPLAN '99 Conference on Programming Languages Design and Implementation (PLDI) (May 1999). Google Scholar
Digital Library
- OTTONI, G., RANGAN, R., STOLER, A., AND AUGUST, D. I. Automatic thread extraction with decoupled software pipelining. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2005), pp. 105--118. Google Scholar
Digital Library
- POSTIFF, M., TYSON, G., AND MUDGE, T. Performance limits of trace caches. Tech. Rep. CSE-TR-373-98, University of Maryland, Department of Electrical Engineering and Computer Science, CSE, September 1998.Google Scholar
- PRICE, G., AND VACHHARAJANI, M. A case for compressing traces with BDDs. Computer Architecture Letters 5 (November 2006). Google Scholar
Digital Library
- RANGAN, R., VACHHARAJANI, N., VACHHARAJANI, M., AND AUGUST, D. I. Decoupled software pipelining with the synchronization array. In 13th International Conference on Parallel Architectures and Compilation Techniques (PACT) (September 2004), pp. 177--188. Google Scholar
Digital Library
- RUDELL, R. Dynamic variable ordering for ordered binary decision diagrams. In Proceedings of the International Conference on Computer-Aided Design (ICCAD) (November 1993), pp. 42--47. Google Scholar
Digital Library
- SOMENZI, F. CUDD: Colorado University Decision Diagram package, release 2.30. Tech. rep., University of Colorado at Boulder, http://vlsi.colorado.edu/~fabio/CUDD/, 1998.Google Scholar
- VACHHARAJANI, N., IYER, M., ASHOK, C., VACHHARAJANI, M., AUGUST, D. I., AND CONNORS, D. A. Chip multi-processor scalability for single-threaded applications. In Proceedings of the 2005 Workshop on Design, Architecture and Simulation of Chip Multi-Processors (dasCMP) (November 2005). Google Scholar
Digital Library
- WALL, D. W. Limits of instruction-level parallelism. Tech. Rep. 93/6, DEC WRL, November 1993.Google Scholar
- WHALEY, J., AND LAM, M. S. Cloning-based context-sensitive pointer alias analysis using binary decision diagrams. In PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming Language Design and Implementation (PLDI) (2004), pp. 131--144. Google Scholar
Digital Library
- ZHANG, X., GUPTA, R., AND ZHANG, Y. Precise dynamic slicing algorithms. In Proceedings of the 25th International Conference on Software Engineering (ICSE) (May 2003), pp. 319--329. Google Scholar
Digital Library
- ZHANG, X., GUPTA, R., AND ZHANG, Y. Efficient forward computation of dynamic slices using reduced ordered binary decision diagrams. In Procedings of the 26th International conference on Software Engineering (ICSE) (2004). Google Scholar
Digital Library
- ZILLES, C., AND SOHI, G. Master/slave speculative parallelization. In Proceedings of the 35th International Symposium on Microarchitecture (November 2002). Google Scholar
Digital Library
Index Terms
Visualizing potential parallelism in sequential programs
Recommendations
Parallelizing Subroutines in Sequential Programs
An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran ...
Compiling data-parallel programs for clusters of SMPs: Research Articles
Compilers for Parallel ComputersClusters of shared-memory multiprocessors (SMPs) have become the most promising parallel computing platforms for scientific computing. However, SMP clusters significantly increase the complexity of user application development when using the low-level ...
Loop-Level Parallelism in Numeric and Symbolic Programs
A new technique for estimating and understanding the speed improvement that can resultfrom executing a program on a parallel computer is described. The technique requires noadditional programming and minimal effort by a program's author. The analysis ...





Comments