skip to main content
research-article

Hardware support for fine-grained event-driven computation in Anton 2

Published:16 March 2013Publication History
Skip Abstract Section

Abstract

Exploiting parallelism to accelerate a computation typically involves dividing it into many small tasks that can be assigned to different processing elements. An efficient execution schedule for these tasks can be difficult or impossible to determine in advance, however, if there is uncertainty as to when each task's input data will be available. Ideally, each task would run in direct response to the arrival of its input data, thus allowing the computation to proceed in a fine-grained event-driven manner. Realizing this ideal is difficult in practice, and typically requires sacrificing flexibility for performance.

In Anton 2, a massively parallel special-purpose supercomputer for molecular dynamics simulations, we addressed this challenge by including a hardware block, called the dispatch unit, that provides flexible and efficient support for fine-grained event-driven computation. Its novel features include a many-to-many mapping from input data to a set of synchronization counters, and the ability to prioritize tasks based on their type. To solve the additional problem of using a fixed set of synchronization counters to track input data for a potentially large number of tasks, we created a software library that allows programmers to treat Anton 2 as an idealized machine with infinitely many synchronization counters. The dispatch unit, together with this library, made it possible to simplify our molecular dynamics software by expressing it as a collection of independent tasks, and the resulting fine-grained execution schedule improved overall performance by up to 16% relative to a coarse-grained schedule for precisely the same computation.

References

  1. Ghiath Al-Kadi and Andrei Sergeevich Terechko, "A hardware task scheduler for embedded video processing," 4th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC '09), Paphos, Cyprus, January 25-28, 2009, pp. 140--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nimar S. Arora, Robert D. Blumofe and C. Greg Plaxton, "Thread scheduling for multiprogrammed multiprocessors," 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '98), Puerto Vallarta, Mexico, June 28-July 2, 1998, pp. 119--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Joseph M. Arul and Krishna M. Kavi, "Scalability of scheduled dataflow architecture (SDF) with register contexts," 5th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2002), Beijing, China, October 23-25, 2002, pp. 214--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arvind and David E. Culler, "Dataflow architectures," Annual Review of Computer Science, Volume 1, June, 1986, pp. 225--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arvind and Rishiyur S. Nikhil, "Executing a program on the MIT tagged-token dataflow architecture," IEEE Transactions on Computers, Volume 39, Issue 3, March, 1990, pp. 300--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall and Yuli Zhou, "Cilk: an efficient multithreaded runtime system," Journal of Parallel and Distributed Computing, Volume 37, Issue 1, August, 1996, pp. 55--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Robert D. Blumofe and Charles E. Leiserson, "Scheduling multithreaded computations by work stealing," Journal of the ACM, Volume 46, Number 5, September, 1999, pp. 720--748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Greg Buzzard, David Jacobson, Milon Mackay, Scott Marovich and John Wilkes, "An implementation of the Hamlyn sender-managed interface architecture," 2nd USENIX Symposium on Operating System Design and Implementation (OSDI '96), Seattle, WA, October 28-31, 1996, pp. 245--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. David Chase and Yossi Lev, "Dynamic circular work-stealing deque," 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2005), Las Vegas, NV, July 18-20, 2005, pp. 21--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ron O. Dror, J.P. Grossman, Kenneth M. Mackenzie, Brian Towles, Edmond Chow, John K. Salmon, Cliff Young, Joseph A. Bank, Brannon Batson, Martin M. Deneroff, Jeffrey S. Kuskin, Richard H. Larson, Mark A. Moraes and David E. Shaw, "Exploiting 162-nanosecond end-to-end communication latency on Anton," International Conference on High Performance Computing, Networking, Storage and Analysis (SC10), New Orleans, LA, November 15-18, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Thorsten von Eicken, David E. Culler, Seth Copen Goldstein and Klaus Erik Schauser, "Active messages: a mechanism for integrated communication and computation," 19th International Symposium on Computer Architecture (ISCA 1992), Gold Coast, Australia, May 19-21, 1992, pp. 430--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yoav Etsion, Felipe Cabarcas, Alejandro Rico, Alex Ramirez, Rosa M. Badia, Eduard Ayguade, Jesus Labarta and Mateo Valero, "Task superscalar: an out-of-order task pipeline," 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '43), Atlanta, Georgia, December 4-8, 2010, pp. 89--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mark Gebhart, Bertrand A. Maher, Katherine E. Coons, Jeff Diamond, Paul Gratz, Mario Marino, Nitya Ranganathan, Behnam Robatimili, Aaron Smith, James Burrill, Stephen W. Keckler, Doug Berger and Kathryn S. McKinley, "An evaluation of the TRIPS computer system," 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2009), Washington, D.C., March 7-11, 2009, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Danny Hendler and Nir Shavit, "Non-blocking steal-half work queues," 21st Annual ACM Symposium on Principles of Distributed Computing (PODC 2002), Monterey, CA, July 21-24, 2002, pp. 280--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ralf Hoffmann, Matthias Korch and Thomas Rauber, "Performance evaluation of task pools based on hardware synchronization," ACM/IEEE Conference on High Performance Networking and Computing (SC04), Pittsburgh, PA, November 6-12, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Laxmikant V. Kale and Sanjeev Krishnan, "CHARM++: a portable concurrent object oriented system based on C++," 8th Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 1993), Washington, D.C., September 26-October 1, 1993, pp. 91--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Matthias Korch and Thomas Rauber, "A comparison of task pools for dynamic load balancing of irregular algorithms," Journal of Concurrency and Computation: Practice & Experience, Volume 16, Issue 1, December, 2003, pp. 1--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sameer Kumar, Gabor Dozsa, Gheorghe Almasi, Dong Chen, Mark E. Giampapa, Philip Heidelberger, Michael Blocksome, Ahmad Faraj, Jeff Parker, Joseph Ratterman, Brian Smith and Charles Archer, "The deep computing messaging framework: Generalized scalable message passing on the Blue Gene/P supercomputer," 22nd International Conference on Supercomputing (ICS '08), Island of Kos, Greece, June 7-12, 2008, pp. 94--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sanjeev Kumar, Christopher J. Hughes and Anthony Nguyen, "Carbon: architectural support for fine-grained parallelism on chip multiprocessors," 34th International Symposium on Computer Architecture (ISCA 2007), San Diego, CA, June 9-13, 2007, pp. 162--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jeffrey S. Kuskin, Cliff Young, J.P. Grossman, Brannon Batson, Martin M. Deneroff, Ron O. Dror and David E. Shaw, "Incorporating flexibility in Anton, a specialized machine for molecular dynamics simulation," 14th International Symposium on High Performance Computer Architecture (HPCA-14), Salt Lake City, UT, February 16-20, 2008, pp. 343--354.Google ScholarGoogle Scholar
  21. Michael D. Noakes, Deborah A. Wallach and William J. Dally, "The J-Machine multicomputer: an architectural evaluation," 20th International Symposium on Computer Architecture (ISCA 1993), San Diego, CA, May 16-19, 1993, pp. 224--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gregory M. Papadopoulos and Kenneth R. Traub, "Multithreading: a revisionist view of dataflow architectures," 18th Annual International Symposium on Computer Architecture (ISCA 1991), Toronto, Canada, May 27-30, 1991, pp. 342--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shuichi Sakai, Yoshinori Yamaguchi, Kei Hiraki, Yuetsu Kodama and Toshitsugu Yuba, "An architecture of a dataflow single chip processor," 16th Annual International Symposium on Computer Architecture (ISCA 1989), Jerusalem, Israel, June, 1989, pp. 46--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Daniel Sanchez, Richard M. Yoo and Christos Kozyrakis, "Flexible architectural support for fine-grain scheduling," 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2010), Pittsburgh, PA, March 13--17, 2010, pp. 311--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Steven L. Scott, "Synchronization and communication in the T3E multiprocessor," 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 1996), Cambridge, MA, October 1-5, 1996, pp. 26--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. David E. Shaw, Martin M. Deneroff, Ron O. Dror, Jeffrey S. Kuskin, Richard H. Larson, John K. Salmon, Cliff Young, Brannon Batson, Kevin J. Bowers, Jack C. Chao, Michael P. Eastwood, Joseph Gagliardo, J.P. Grossman, C. Richard Ho, Douglas J. Ierardi, István Kolossváry, John L. Klepeis, Timothy Layman, Christine McLeavey, Mark A. Moraes, Rolf Mueller, Edward C. Priest, Yibing Shan, Jochen Spengler, Michael Theobald, Brian Towles and Stanley C. Wang, "Anton, a special-purpose machine for molecular dynamics simulation," 34th Annual International Symposium on Computer Architecture (ISCA 2007), San Diego, CA, June 9-13, 2007, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Magnus Själander, Andrei Terechko and Marc Duranton, "A look-ahead task management unit for embedded multi-core architectures," 11th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2008), Parma, Italy, September 3-5, 2008, pp. 149--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kyriakos Stavrou, Costas Kyriacou, Paraskevas Evripidou and Pedro Trancoso, "Chip multiprocessor based on data-driven multithreading model," International Journal of High Performance Systems Architectures, Volume 1, Number 1, 2007, pp. 24--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III and Anant Agarwal, "On-chip interconnection architecture of the Tile Processor," IEEE Micro, Volume 27, Issue 5, September, 2007, pp. 15--31. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hardware support for fine-grained event-driven computation in Anton 2

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 48, Issue 4
        ASPLOS '13
        April 2013
        540 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2499368
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
          March 2013
          574 pages
          ISBN:9781450318709
          DOI:10.1145/2451116

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 March 2013

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!