skip to main content
research-article

X10 and APGAS at Petascale

Authors Info & Claims
Published:06 February 2014Publication History
Skip Abstract Section

Abstract

X10 is a high-performance, high-productivity programming language aimed at large-scale distributed and shared-memory parallel applications. It is based on the Asynchronous Partitioned Global Address Space (APGAS) programming model, supporting the same fine-grained concurrency mechanisms within and across shared-memory nodes.

We demonstrate that X10 delivers solid performance at petascale by running (weak scaling) eight application kernels on an IBM Power 775 supercomputer utilizing up to 55,680 Power7 cores (for 1.7 Pflop/s of theoretical peak performance). We detail our advances in distributed termination detection, distributed load balancing, and use of high-performance interconnects that enable X10 to scale out to tens of thousands of cores.

For the four HPC Class 2 Challenge benchmarks, X10 achieves 41% to 87% of the system's potential at scale (as measured by IBM's HPCC Class 1 optimized runs). We also implement K-Means, Smith-Waterman, Betweenness Centrality, and Unbalanced Tree Search (UTS) for geometric trees. Our UTS implementation is the first to scale to petaflop systems.

References

  1. G. Almási, B. Dalton, L. L. Hu, F. Franchetti, Y. Liu, A. Sidelnik, T. Spelce, I. G. Tanase, E. Tiotto, Y. Voronenko, and X. Xue. 2010 IBM HPC Challenge Class II Submission, Nov. 2010.Google ScholarGoogle Scholar
  2. B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony. The PERCS high-performance interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects, HOTI '10, pages 75--82, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Barton, C. Casçaval, G. Almási, Y. Zheng, M. Farreras, S. Chatterje, and J. N. Amaral. Shared memory programming for large scale machines. In Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, PLDI '06, pages 108--117, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. M. Blackburn, R. L. Hudson, R. Morrison, J. E. B. Moss, D. S. Munro, and J. Zigman. Starting with termination: a methodology for building distributed garbage collection algorithms. In Proceedings of the 24th Australasian conference on Computer science, ACSC '01, pages 20--28, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25:163--177, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In In SDM, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  7. B. Chamberlain, S.-E. Choi, M. Dumler, T. Hildebrandt, D. Iten, V. Litvinov, G. Titus, C. BaAaglino, R. Sobel, B. Holt, and J. Keasler. Chapel HPC Challenge Entry: 2012, Nov. 2012.Google ScholarGoogle Scholar
  8. S. Crafa, D. Cunningham, V. Saraswat, A. Shinnar, and O. Tardieu. Semantics of (Resilient) X10. http://arxiv.org/abs/1312.3739, Dec. 2013.Google ScholarGoogle Scholar
  9. Cray. Chapel language specification version 0.93. Apr. 2013.Google ScholarGoogle Scholar
  10. J. Dinan, D. B. Larkins, P. Sadayappan, S. Krishnamoorthy, and J. Nieplocha. Scalable work stealing. In SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pages 1--11, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dongarra, R. Graybill, W. Harrod, R. Lucas, E. Lusk, P. Luszczek, J. Mcmahon, A. Snavely, J. Vetter, K. Yelick, S. Alam, R. Campbell, L. Carrington, T.-Y. Chen, O. Khalili, J. Meredith, and M. Tikir. DARPA's HPCS Program: History, Models, Tools, Languages. In M. V. Zelkowitz, editor, Advances in COMPUTERS High Performance Computing, volume 72 of Advances in Computers, pages 1 -- 100. Elsevier, 2008.Google ScholarGoogle Scholar
  12. K. Ebcioglu, V. Sarkar, T. El-Ghazawi, and J. Urbanic. An experiment in measuring the productivity of three parallel programming languages. In P-PHEC workshop, held in conjunction with HPCA, February 2006.Google ScholarGoogle Scholar
  13. D. Grove, O. Tardieu, D. Cunningham, B. Herta, I. Peshansky, and V. Saraswat. A performance model for X10 applications: what's going on under the hood? In Proceedings of the 2011 ACM SIGPLAN X10 Workshop, X10 '11, pages 1:1--1:8, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. HPC Challenge Awards Competition. http://www.hpcchallenge.org/.Google ScholarGoogle Scholar
  15. HPC Challenge Benchmark Record 482. http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=482, July 2012.Google ScholarGoogle Scholar
  16. HPC Challenge Benchmark Record 495. http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=495, Nov. 2012.Google ScholarGoogle Scholar
  17. HPC Challenge Benchmarks. http://icl.cs.utk.edu/hpcc/.Google ScholarGoogle Scholar
  18. L. V. Kalez, A. Arya, A. Bhatele, A. Gupta, N. Jain, P. Jetley, J. Lifflander, P. Miller, Y. Sun, R. Venkataramanz, L. Wesolowski, and G. Zheng. CharmGoogle ScholarGoogle Scholar
  19. for Productivity and Performance, Nov. 2011.Google ScholarGoogle Scholar
  20. P. P. Laboratory. The Charm+Parallel Programming System Manual. Technical Report Version 6.4, Department of Computer Science, University of Illinois, Urbana-Champaign, 2013.Google ScholarGoogle Scholar
  21. J. K. Lee and J. Palsberg. Featherweight X10: a core calculus for async-finish parallelism. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 25--36, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Lloyd. Least squares quantization in PCM. IEEE Trans. Inf. Theor., 28(2):129--137, Sept. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. MacQueen. Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. Math. Stat. Probab., Univ. Calif. 1965/66, 1, 281--297 (1967)., 1967.Google ScholarGoogle Scholar
  24. J. Mellor-Crummey, L. Adhianto, G. Jin, M. Krentel, K. Murthy, W. Scherer, and C. Yang. Class II Submission to the HPC Challenge Award Competition Coarray Fortran 2.0, Nov. 2011.Google ScholarGoogle Scholar
  25. M. Nakao, H. Murai, T. Shimosaka, and M. Sato. XcalableMP 2012 HPC Challenge Class II Submission, Nov. 2012.Google ScholarGoogle Scholar
  26. S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C.-W. Tseng. UTS: an unbalanced tree search benchmark. In Proceedings of the 19th international conference on Languages and compilers for parallel computing, LCPC'06, pages 235--250, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Olivier and J. Prins. Scalable dynamic load balancing using UPC. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing, pages 123--131, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Paudel and J. N. Amaral. Using the Cowichan problems to investigate the programmability of X10 programming system. In Proceedings of the 2011 ACM SIGPLAN X10 Workshop, X10 '11, pages 4:1--4:10, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Prins, J. Huan, B. Pugh, C.-W. Tseng, and P. Sadayappan. UPC Implementation of an Unbalanced Tree Search Benchmark. Technical Report 03-034, Univ. of North Carolina at Chapel Hill, October 2003.Google ScholarGoogle Scholar
  30. D. Quintero, K. Bosworth, P. Chaudhary, R. G. da Silva, B. Ha, J. Higino, M.-E. Kahle, T. Kamenoue, J. Pearson, M. Perez, F. Pizzano, R. Simon, and K. Sun. IBM Power Systems 775 for AIX and Linux HPC Solution. IBM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Rajamony, M. W. Stephenson, and W. E. Speight. The Power 775 Architecture at Scale. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pages 183--192, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. V. Saraswat, G. Almasi, G. Bikshandi, C. Cascaval, D. Cunningham, D. Grove, S. Kodali, I. Peshansky, and O. Tardieu. The Asynchronous Partitioned Global Address Space Model. In AMP'10: Proceedings of The First Workshop on Advances in Message Passing, June 2010.Google ScholarGoogle Scholar
  33. V. Saraswat, B. Bloom, I. Peshansky, O. Tardieu, and D. Grove. The X10 language specification, v2.2.3. Aug. 2012.Google ScholarGoogle Scholar
  34. V. Saraswat and R. Jagadeesan. Concurrent clustered programming. In Concur'05, pages 353--367, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. V. Saraswat, O. Tardieu, D. Grove, D. Cunningham, M. Takeuchi, and B. Herta. A brief introduction to X10 (for the high performance programmer). http://x10.sourceforge.net/documentation/intro/latest/html/, Feb. 2013.Google ScholarGoogle Scholar
  36. V. A. Saraswat, P. Kambadur, S. Kodali, D. Grove, and S. Krishnamoorthy. Lifeline-based global load balancing. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 201--212, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. B. Sinha, L. V. Kale, and B. Ramkumar. A dynamic and adaptive quiescence detection algorithm. Technical Report 93--11, Parallel Programming Laboratory, Department of Computer Science , University of Illinois, Urbana-Champaign, 1993.Google ScholarGoogle Scholar
  38. T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195 -- 197, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  39. G. Tanase, G. Almási, E. Tiotto, M. Alvanos, A. Ly, and B. Dalton. Performance analysis of the IBM XL UPC on the PERCS architecture. Technical Report RC25360, IBM Research, Mar. 2013.Google ScholarGoogle Scholar
  40. O. Tardieu, D. Grove, B. Bloom, D. Cunningham, B. Herta, P. Kambadur, V. A. Saraswat, A. Shinnar, M. Takeuchi, and M. Vaziri. X10 for Productivity and Performance at Scale, Nov. 2012.Google ScholarGoogle Scholar
  41. O. Tardieu, H. Wang, and H. Lin. A work-stealing scheduler for X10's task parallelism with suspension. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 267--276, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wikipedia. PERCS. http://en.wikipedia.org/w/index.php?title=PERCS, 2011.Google ScholarGoogle Scholar
  43. C. Yang, K. Murthy, and J. Mellor-Crummey. Managing Asynchronous Operations in Coarray Fortran 2.0. In IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pages 1321--1332, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. W. Zhang, O. Tardieu, D. Grove, B. Herta, T. Kamada, V. Saraswat, and M. Takeuchi. GLB: Lifeline-based Global Load Balancing Library in X10. http://arxiv.org, Dec. 2013.Google ScholarGoogle Scholar

Index Terms

  1. X10 and APGAS at Petascale

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 49, Issue 8
          PPoPP '14
          August 2014
          390 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2692916
          Issue’s Table of Contents
          • cover image ACM Conferences
            PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
            February 2014
            412 pages
            ISBN:9781450326568
            DOI:10.1145/2555243

          Copyright © 2014 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 February 2014

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!