skip to main content
10.1145/1693453.1693492acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Modeling advanced collective communication algorithms on cell-based systems

Published:09 January 2010Publication History

ABSTRACT

This paper presents and validates performance models for a varietyvof high-performance collective communication algorithms for systems with Cell processors. The systems modeled include a single Cell processor, two Cell chips on a Cell Blade, and a cluster of Cell Blades. The models extend PLogP, the well-known point-topoint performance model, by accounting for the unique hardware characteristics of the Cell (e.g., heterogeneous interconnects and DMA engines) and by applying the model to collective communication. This paper also presents a micro-benchmark suite to accurately measure the extended PLogP parameters on the Cell Blade and then uses these parameters to model different algorithms for the barrier, broadcast, reduce, all-reduce, and all-gather collective operations. Out of 425 total performance predictions, 398 of them see less than 10% error compared to the actual execution time and all of them see less than 15%.

References

  1. http://www.mcs.anl.gov/research/projects/mpich2.Google ScholarGoogle Scholar
  2. T. Ainsworth and T. Pinkston. On characterizing performance of the Cell broadband engine element interconnect bus. Networks-on-Chip, 2007. NOCS 2007. First International Symposium on, pages 18--29, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman. Loggp: Incorporating long messages into the logp model --- one step closer towards a realistic model for parallel computation. Technical report, Santa Barbara, CA, USA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Q. Ali, S. P. Midkiff, and V. S. Pai. Efficient high performance collective communication for the cell blade. In ICS '09: Proceedings of the 23rd International Conference on Supercomputing, pages 193--203, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Barchet-Steffenel and G. Mounie. Total exchange performance modelling under network contention. In Proceedings of the 6th International Conference on Parallel Processing and Applied Mathematics, LNCS Vol. 3911, pages 100--107, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Barker, K. Davis, A. Hoisie, D. J. Kerbyson, M. Lang, S. Pakin, and J. C. Sancho. Entering the petaflop era: The architecture and performance of Roadrunner. In IEEE/ACM Supercomputing (SC08), November 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Bruck, C. tien Ho, S. Kipnis, E. Upfal, and D.Weathersby. Efficient algorithms for all-to-all communications in multi-port messagepassing systems. In IEEE Transactions on Parallel and Distributed Systems, pages 298--309, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Buntinas, G. Mercier, and W. Gropp. Data transfers between processes in an SMP system: Performance study and application to MPI. Parallel Processing, International Conference on, 0:487--496, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Culler, R. K. Y, D. Patterson, A. Sahay, R. Subramonian, and T. V. Eicken. LogP: Towards a realistic model of parallel computation. In In Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1--12, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. E. Culler, L. T. Liu, R. P. Martin, and C. O. Yoshikawa. Assessing fast network interfaces. IEEE Micro, 16(1):35--43, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Faraj and X. Yuan. Automatic generation and tuning of MPI collective communication routines. In ICS '05: Proceedings of the 19th annual international conference on Supercomputing, pages 393--402, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. W. Hockney. The communication challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Computing, 20(3):389--398, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Kielmann, H. E. Bal, and S. Gorlatch. Bandwidth-efficient collective communication for clustered wide area systems. In In Proc. International Parallel and Distributed Processing Symposium (IPDPS 2000), Cancun, pages 492--499, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Kistler, M. Perrone, and F. Petrini. Cell multiprocessor interconnection network: Built for speed. IEEE Micro, 26(3), May-June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. A. Moritz and M. I. Frank. LoGPC: Modeling network contention in message-passing programs. IEEE Transactions on Parallel and Distributed Systems, 12(4):404--415, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Pakin. Receiver-initiated message passing over RDMA networks. In 22nd International Parallel and Distributed Processing Symposium (IPDPS 2008).Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Pjesivac-Grbovic, T. Angskun, G. Bosilca, G. E. Fagg, E. Gabriel, and J. Dongarra. Performance analysis of MPI collective operations. In IPDPS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Pjesivac-Grbovic, T. Angskun, G. Bosilca, G. E. Fagg, E. Gabriel, and J. J. Dongarra. Performance analysis of MPI collective operations. Cluster Computing Journal, 10:127--143, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Rabenseifner. Optimization of Collective Reduction Operations. In Proceedings of the International Conference on Computational Science, June 2004.Google ScholarGoogle Scholar
  20. R. Thakur and W. Gropp. Improving the performance of collective operations in MPICH. In Recent Advances in Parallel Virtual Machine and Message Passing Interface. Number 2840 in LNCS, Springer Verlag (2003) 257267 10th European PVM/MPI Users Group Meeting, pages 257--267. Springer Verlag, 2003.Google ScholarGoogle Scholar
  21. R. Thakur, R. Rabenseifner, andW. Gropp. Optimization of collective communication operations in MPICH. International Journal of High Performance Computing Applications, 19(1):49--66, February 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Modeling advanced collective communication algorithms on cell-based systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
        January 2010
        372 pages
        ISBN:9781605588773
        DOI:10.1145/1693453
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 45, Issue 5
          PPoPP '10
          May 2010
          346 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1837853
          Issue’s Table of Contents

        Copyright © 2010 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 January 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate230of1,014submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!