skip to main content
research-article

Benefits of Adding Hardware Support for Broadcast and Reduce Operations in MPSoC Applications

Published:03 September 2014Publication History
Skip Abstract Section

Abstract

MPI has been used as a parallel programming model for supercomputers and clusters and recently in MultiProcessor Systems-on-Chip (MPSoC). One component of MPI is collective communication and its performance is key for certain parallel applications to achieve good speedups. Previous work showed that, with synthetic communication-only benchmarks, communication improvements of up to 11.4-fold and 22-fold for broadcast and reduce operations, respectively, can be achieved by providing hardware support at the network level in a Network-on-Chip (NoC). However, these numbers do not provide a good estimation of the advantage for actual applications, as there are other factors that affect performance besides communications, such as computation. To this end, we extend our previous work by evaluating the impact of hardware support over a set of five parallel application kernels of varying computation-to-communication ratios. By introducing some useful computation to the performance evaluation, we obtain more representative results of the benefits of adding hardware support for broadcast and reduce operations. The experiments show that applications with lower computation-to-communication ratios benefit the most from hardware support as they highly depend on efficient collective communications to achieve better scalability. We also extend our work by doing more analysis on clock frequency, resource usage, power, and energy. The results show reasonable scalability for resource utilization and power in the network interfaces as the number of channels increases and that, even though more power is dissipated in the network interfaces due to the added hardware, the total energy used can still be less if the actual speedup is sufficient. The application kernels are executed in a 24-embedded-processor system distributed across four FPGAs.

References

  1. L. A. Aguilar, D. A. Steinman, and R. S. C. Cobbold. 2010. On the synthesis of sample volumes for real-time spectral doppler ultrasound simulation. Ultrasound Med. Biol. 36, 12, 2107--2116.Google ScholarGoogle ScholarCross RefCross Ref
  2. Q. Ali, S. P. Midkiff, and V. S. Pai. 2009. Efficient high performance collective communication for the cell blade. In Proceedings of the 23rd International Conference on Supercomputing (ICS'09). ACM Press, New York, 193--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. P. Allen and D. J. Tildesley. 1987. Computer Simulation of Liquids. Clarendon Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Almasi, P. Heidelberger, C. J. Archer, X. Martorell, C. C. Erway, J. E. Moreira, B. Steinmacher-Burow, and Y. Zheng. 2005. Optimization of mpi collective communication on bluegene/l systems. In Proceedings of the 19th Annual International Conference on Supercomputing (ICS'05). ACM Press, New York, 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Barnett, R. Littlefield, D. Payne, and R. Van De Geijn. 1993. Global combine on mesh architectures with wormhole routing. In Proceedings of the 7th International Parallel Processing Symposium. 156--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Beecube 2011. Beecube. http://beecube.com/.Google ScholarGoogle Scholar
  7. I. S. Dhillon and D. S. Modha. 2000. A data-clustering algorithm on distributed memory multiprocessors. In Proceedings of the Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems (SIGKDD'00). Springer, 245--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Gao, A. Schmidt, and R. Sass. 2010. Impact of reconfigurable hardware on accelerating MPI reduce. In Proceedings of the International Conference on Field-Programmable Technology (FPT'10). 29--36.Google ScholarGoogle Scholar
  9. T. Hoefler, C. Siebert, and W. Rehm. 2007. A practically constant-time MPI broadcast algorithm for large-scale infiniband clusters with multicast. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'07). 1--8.Google ScholarGoogle Scholar
  10. J. Liu, A. R. Mamidala, and D. K. Panda. 2003. Fast and scalable MPI-level broadcast using infiniband's hardware multicast support. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'07).Google ScholarGoogle Scholar
  11. P. Mahr, C. Lorchner, H. Ishebabi, and C. Bobda. 2008. SoC-MPI: A flexible message passing library for multiprocessor systems-on-chips. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs. IEEE Computer Society, 187--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. MPI Forum. 1993. MPI: A message passing interface. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing'93). ACM Press, New York, 878--883. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. S. Pacheco. 1997. An application: Numerical integration. In Parallel Programming with MPI, Morgan Kaufmann Publishers, San Francisco, 53--60.Google ScholarGoogle Scholar
  14. Y. Peng, M. Saldana, and P. Chow. 2011. Hardware support for broadcast and reduce in MPSOC. In Proceedings of the 21st International Conference on Field-Programmable Logic and Applications. 144--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Saldana, A. Patel, C. Madill, D. Nunes, D. Wang, P. Chow, R. Wittig, H. Styles, and A. Putnam. 2010. MPI as a programming model for high-performance reconfigurable computers. ACM Trans. Reconfig. Technol. Syst. 3, 22:1--22:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. D. Underwood, W. B. Ligon III, and R. R. Sass. 2003. Analysis of a prototype intelligent network interface. Concurr. Comput. Pract. Exper. 15, 7--8, 751--777.Google ScholarGoogle ScholarCross RefCross Ref
  17. M. K. Velamati, A. Kumar, N. Jayam, G. Senthilkumar, P. K. Baruah, R. Sharma, S. Kapoor, and A. Srinivasan. 2007. Optimization of collective communication in intra-cell MPI. In Proceedings of the 14th International Conference on High-Performance Computing (HiPC'07). Springer, 488--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Voltaire. 2011. Voltaire. http://www.voltaire.com/.Google ScholarGoogle Scholar
  19. Xpower. 2011. Xilinx. http://www.xilinx.com/.Google ScholarGoogle Scholar
  20. J. Zhu. 1994. Solving Partial Differential Equations on Parallel Computers. World Scientific. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Benefits of Adding Hardware Support for Broadcast and Reduce Operations in MPSoC Applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 7, Issue 3
          Special Issue on 11th International Conference on Field-Programmable Technology (FPT'12) and Special Issue on the 7th International Workshop on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC'12)
          August 2014
          199 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/2664590
          Issue’s Table of Contents

          Copyright © 2014 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 September 2014
          • Accepted: 1 February 2014
          • Revised: 1 January 2014
          • Received: 1 June 2013
          Published in trets Volume 7, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!