Abstract
Performance analysis of a distributed system is typically achieved by collecting profiles whose underlying events are timestamped with unsynchronized clocks of multiple machines in the system. To allow comparison of timestamps taken at different machines, several timestamp synchronization algorithms have been developed. However, the inaccuracies associated with these algorithms can lead to inaccuracies in the final results of performance analysis. To address this problem, in this paper, we develop a system for constructing distributed performance profiles called DProf. At the core of DProf is a new timestamp synchronization algorithm, FreeZer, that tightly bounds the inaccuracy in a converted timestamp to a time interval. This not only allows timestamps from different machines to be compared, it also enables maintaining strong guarantees throughout the comparison which can be carefully transformed into guarantees for analysis results. To demonstrate the utility of DProf, we use it to implement dCSP and dCOZ that are accuracy bounded distributed versions of Context Sensitive Profiles and Causal Profiles developed for shared memory systems. While dCSP enables user to ascertain existence of a performance bottleneck, dCOZ estimates the expected performance benefit from eliminating that bottleneck. Experiments with three distributed applications on a cluster of heterogeneous machines validate that inferences via dCSP and dCOZ are highly accurate. Moreover, if FreeZer is replaced by two existing timestamp algorithms (linear regression & convex hull), the inferences provided by dCSP and dCOZ are severely degraded.
Supplemental Material
- L. Adhianto, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. Tallent. 2008. Hpctoolkit: Performance measurement and analysis for supercomputers with node-level parallelism. In Workshop on Node Level Parallelism for Large Scale Supercomputers, in conjuction with ACM/IEEE SC.Google Scholar
- T.E. Anderson and E.D. Lazowska. 1990. Quartz: A tool for tuning parallel program performance. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 115–125.Google Scholar
- P. Ashton. 1995. Algorithms for off-line clock synchronization. In Technical Report TR-COSC12/95, Dept. of Computer Science, Univ. of Canterbury.Google Scholar
- D. Becker. 2010. Timestamp Synchronization of Concurrent Events. In Schriften des Forschungszentrums Julich, IAS Series, Volume 4.Google Scholar
- Z. Benavides, R. Gupta, and X. Zhang. 2017. Annotation guided collection of context-sensitive parallel execution profiles. In The 17th International Conference on Runtime Verification, LNCS 10548, Springer. 103–120.Google Scholar
- D. Böhme, F. Wolf, B.R. de Supinski, M. Schulz, and M. Geimer. 2012. Scalable critical-path based performance analysis. In IEEE 26th International Parallel & Distributed Processing Symposium. 1330–1340.Google Scholar
- K. Du Bois, J.B. Sartor, S. Eyerman, and L. Eeckhout. 2013. Bottle graphs: Visualizing scalability bottlenecks in multithreaded applications. In ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications. 355–372.Google Scholar
- C. Curtsinger and E.D. Berger. 2015. Coz: finding code that counts with causal profiling. In The 25th Symposium on Operating Systems Principles. 184–197.Google Scholar
- R. Ding, H. Zhou, J.-G. Lou, H. Zhang, Q. Lin, Q. Fu, D. Zhang, and T. Xie. 2015. Log2: A cost-aware logging mechanism for performance diagnosis. In USENIX Annual Technical Conference. 139–150.Google Scholar
- A. Duda, G. Harrus, Y. Haddad, and G. Bernard. 1987. Estimating global time in distributed systems. In ICDCS, Volume 87. 299–306.Google Scholar
- T.H. Dunigan. 1992. Hypercube clock synchronization. In Concurrency and Computation: Practice and Experience, 4(3):257-268.Google Scholar
Digital Library
- M. Geimer, F. Wolf, B.J. Wylie, E. Ábrahám, D. Becker, and B. Mohr. 2010. The scalasca performance toolset architecture. In Concurrency and Computation: Practice and Experience, 22(6):702-719.Google Scholar
Digital Library
- Y. Geng, S. Liu, Z. Yin, A. Naik, B. Prabhakar, M. Rosenblum, and A. Vahdat. 2018. Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization. In 15th USENIX Symposium on Networked Systems Design and Implementation. 81–94.Google Scholar
- R. Hofman and U. Hilgers. 1998. Theory and tool for estimating global time in parallel and distributed systems. In Sixth Euromicro Workshop on Parallel and Distributed Processing. 173–179.Google Scholar
- J.K. Hollingsworth. 1996. An online computation of critical path profiling. In SIGMETRICS Symposium on Parallel and Distributed Tools. 11–20.Google Scholar
Digital Library
- J.K. Hollingsworth and B.P. Miller. 1994. Slack: a new performance metric for parallel programs. In Univ. of Maryland and Univ. of Wisconsin-Madison, Tech. Rep.Google Scholar
- J. Leskovec, K.J. Lang, A. Dasgupta, and M.W. Mahoney. 2008. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. In CoRR, abs/0810.1355.Google Scholar
- X. Liu and J. Mellor-Crummey. 2011. Pinpointing data locality problems using data-centric analysis. In IEEE/ACM 9th Annual International Symposium on Code Generation and Optimization. 171–180.Google Scholar
- E. Maillet and C. Tron. 1995. On efficiently implementing global time for performance evaluation on multiprocessor systems. In Journal of Parallel and Distributed Computing, 28(1):84–93.Google Scholar
Digital Library
- M. Mariappan and K. Vora. 2019. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In European Conference on Computer Systems (EuroSys). ACM, Article 25, 16 pages. Google Scholar
Digital Library
- B.P. Miller, M. Clark, J. Hollingsworth, S. Kierstead, S.S. Lim, and T. Torzewski. 1990. Ips-2: the second generation of a parallel program measurement system. In IEEE Transactions on Parallel and Distributed Systems, 1(2):206–217.Google Scholar
Digital Library
- W.E. Nagel, A. Arnold, M. Weber, H.-C. Hoppe, and K. Solchenbach. 1996. Vampir: Visualization and analysis of mpi resources. https://tu-dresden.de/zih/forschung/projekte/vampir?set_language=enGoogle Scholar
- L. Page, S. Brin, and R. Motwani. 1999. The PageRank citation ranking: Bringing order to the web. In Stanford InfoLab.Google Scholar
- B. Poirier, R. Roy, and M. Dagenais. 2010. Accurate offline synchronization of distributed traces using kernel-level events. In ACM SIGOPS Operating Systems Review, 44(3):75–87.Google Scholar
Digital Library
- R. Rabenseifner. 1997. The controlled logical clock - a global time for trace based software monitoring of parallel applications in workstation clusters. In The 5th EUROMICRO Workshop on Parallel and Distributed Processing. 477–484.Google Scholar
- S.S. Shende and A.D. Malony. 2006. The tau parallel performance system. In International Journal of High Performance Computing Applications, 20(2):287–311.Google Scholar
Digital Library
- A. Stisen, H. Blunck, S. Bhattacharya, T.S. Prentow, M.B. Kjærgaard, A. Dey, T. Sonne, and M.M. Jensen. 2015. Smart devices are different: Assessing and mitigating mobile sensing heterogeneities for activity recognition. In 13th ACM Conference on Embedded Networked Sensor Systems. 127–140.Google Scholar
- K. Vora, R. Gupta, and G. Xu. 2017a. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 237–251. Google Scholar
Digital Library
- K. Vora, S.-C. Koduru, and R. Gupta. 2014. ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms Using a Relaxed Consistency Based DSM. In International Conference on Object Oriented Programming Systems, Languages and Applications (OOPSLA). 861–878.Google Scholar
- K. Vora, C. Tian, R. Gupta, and Z. Hu. 2017b. CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, New York, NY, USA, 223–236. Google Scholar
Digital Library
- C.-Q. Yang and B.P. Miller. 1988. Critical path analysis for the execution of parallel and distributed programs. In 8th International Conference on Distributed Computing Systems. 366–373.Google Scholar
- A. Yoga and S. Nagarakatte. 2017. Fast Causal Profiler for Task Parallel Programs. In 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering.Google Scholar
- M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, and I. Stoica. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10-10 (2010), 95.Google Scholar
Digital Library
Index Terms
DProf: distributed profiler with strong guarantees
Recommendations
On-Demand Timestamp Synchronization in Multihop Wireless Ad Hoc and Sensor Networks
ICCECT '13: Proceedings of the 2013 International Conference on Control Engineering and Communication TechnologyIt is important but also challenging to record each event, sensed by sensor nodes, with a timestamp in a consistently synchronized way, especially in multihop wireless ad hoc and sensor networks that face severe resource constraints. This paper proposes ...
Fault-Tolerant Clock Synchronization for Distributed Systems Using Continuous Synchronization Messages
FTCS '95: Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant ComputingAbstract: We present a probabilistic synchronization algorithm which sends periodic synchronization messages, instead of periodic bursts of synchronization messages as other algorithms do. Our "continuous" approach therefore avoids the burst network ...
A New Look At Multimedia Synchronization in Distributed Environments
ISPAN '99: Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and NetworksA multimedia system requires to handle automatically and uniformly the generation, manipulation, presentation, storage, and communication of independent discrete and continuous media data. The synchronization between various kinds of media data is a key ...






Comments