Abstract
Recent advances in networking hardware have led to a new generation of Remote Memory Access (RMA) networks in which processors from different machines can communicate directly, bypassing the operating system and allowing higher performance. Researchers and practitioners have proposed libraries and programming models for RMA to enable the development of applications running on these networks,
However, the memory models implied by these RMA libraries and languages are often loosely specified, poorly understood, and differ depending on the underlying network architecture and other factors. Hence, it is difficult to precisely reason about the semantics of RMA programs or how changes in the network architecture affect them.
We address this problem with the following contributions: (i) a coreRMA language which serves as a common foundation, formalizing the essential characteristics of RMA programming; (ii) complete axiomatic semantics for that language; (iii) integration of our semantics with an existing constraint solver, enabling us to exhaustively generate coreRMA programs (litmus tests) up to a specified bound and check whether the tests satisfy their specification; and (iv) extensive validation of our semantics on real-world RMA systems. We generated and ran 7441 litmus tests using each of the low-level RMA network APIs: DMAPP, VPI Verbs, and Portals 4. Our results confirmed that our model successfully captures behaviors exhibited by these networks. Moreover, we found RMA programs that behave inconsistently with existing documentation, confirmed by network experts.
Our work provides an important step towards understanding existing RMA networks, thus influencing the design of future RMA interfaces and hardware.
- P. A. Abdulla, M. F. Atig, Y. Chen, C. Leonardsson, and A. Rezine. Automatic fence insertion in integer programs via predicate abstraction. In Static Analysis - 19th International Symposium, SAS 2012, 2012. Google Scholar
Digital Library
- J. Alglave, D. Kroening, V. Nimal, and M. Tautschnig. Software verification for weak memory via program transformation. In Programming Languages and Systems - 22nd European Symposium on Programming, ESOP 2013, 2013. Google Scholar
Digital Library
- J. Alglave, D. Kroening, V. Nimal, and D. Poetzl. Don’t sit on the fence—A static analysis approach to automatic fence insertion. In Computer Aided Verification - 26th International Conference, CAV 2014, 2014a. J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: Modelling, simulation, testing, and data mining for weak memory. ACM Trans. Program. Lang. Syst., 36(2):7, 2014b. doi: 10.1145/2627752. Google Scholar
Digital Library
- R. Alverson, D. Roweth, and L. Kaplan. The Gemini system interconnect. In Proc. of the IEEE Symposium on High Performance Interconnects (HOTI’10), pages 83–87. IEEE Computer Society, 2010. Google Scholar
Digital Library
- B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony. The PERCS high-performance interconnect. In Proc. of the IEEE Symposium on High Performance Interconnects (HOTI’10), pages 75–82. IEEE Computer Society, Aug. 2010. Google Scholar
Digital Library
- B. W. Barrett, R. B. Brightwell, K. T. T. Pedretti, K. B. Wheeler, K. S. Hemmert, R. E. Riesen, K. D. Underwood, A. B. Maccabe, and T. B. Hudson. The Portals 4.0 network programming interface. Technical report, Sandia National Laboratories, 2012.Google Scholar
- SAND2012-10087.Google Scholar
- R. Belli and T. Hoefler. Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization. IEEE, May 2015. Accepted at IPDPS’15. J. C. Blanchette, T. Weber, M. Batty, S. Owens, and S. Sarkar. Nitpicking C++ concurrency. In Proceedings of the 13th International ACM SIGPLAN Symposium on Principles and Practices of Declarative Programming, PPDP ’11, 2011. Google Scholar
Digital Library
- A. Bouajjani, E. Derevenetc, and R. Meyer. Checking and enforcing robustness against TSO. In Programming Languages and Systems - 22nd European Symposium on Programming, ESOP 2013, 2013. Google Scholar
Digital Library
- S. Burckhardt and M. Musuvathi. Effective program verification for relaxed memory models. In Computer Aided Verification, 20th International Conference, CAV 2008, 2008. Google Scholar
Digital Library
- S. Burckhardt, R. Alur, and M. M. K. Martin. Checkfence: checking consistency of concurrent data types on relaxed memory models. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007. Google Scholar
Digital Library
- J. Burnim, K. Sen, and C. Stergiou. Sound and complete monitoring of sequential consistency for relaxed memory models. In Tools and Algorithms for the Construction and Analysis of Systems - 17th International Conference, TACAS 2011, 2011. Google Scholar
Digital Library
- D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker. The IBM Blue Gene/q Interconnection Network and Message Unit. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 26:1–26:10, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0771-0. doi: 10.1145/2063384.2063419. Google Scholar
Digital Library
- G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese, B. Alverson, T. Johnson, J. Kopnick, M. Higgins, and J. Reinhard. Cray Cascade: A scalable HPC system based on a Dragonfly network. In Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’12), pages 103:1–103:9. IEEE Computer Society, 2012. ISBN 978-1-4673- 0804-5. R. Gerstenberger, M. Besta, and T. Hoefler. Enabling Highlyscalable Remote Memory Access Programming with MPI-3 One Sided. In Proc. of the ACM/IEEE Supercomputing, SC ’13, pages 53:1–53:12, 2013. Google Scholar
Digital Library
- S. Hefty. Scalable fabric interfaces, 2014. OpenFabrics International Developer Workshop 2014.Google Scholar
- T. Hoefler, J. Dinan, R. Thakur, B. Barrett, P. Balaji, W. Gropp, and K. Underwood. Remote Memory Access Programming in MPI-3. Argonne National Laboratory, Tech. Rep, 2013.Google Scholar
- N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. High performance RDMA-based design of HDFS over InfiniBand. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pages 35:1– 35:35, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. ISBN 978-1-4673-0804-5. D. Jackson. Software Abstractions: Logic, Language, and Analysis. The MIT Press, 2006. ISBN 0262101149. Google Scholar
Digital Library
- N. Jiang, J. Kim, and W. J. Dally. Indirect adaptive routing on large scale interconnection networks. SIGARCH Comput. Archit. News, 37(3):220–231, June 2009. ISSN 0163-5964. Google Scholar
Digital Library
- S. Kumar, A. Mamidala, D. A. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. D. Steinmacher-Burrow. PAMI: A parallel active message interface for the Blue Gene/Q supercomputer. In Proc. of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’12), pages 763–773. IEEE Computer Society, 2012. Google Scholar
Digital Library
- M. Kuperstein, M. T. Vechev, and E. Yahav. Automatic inference of memory fences. In Proceedings of 10th International Conference on Formal Methods in Computer-Aided Design, FMCAD 2010, 2010. Google Scholar
Digital Library
- M. Kuperstein, M. T. Vechev, and E. Yahav. Partial-coherence abstractions for relaxed memory models. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, 2011. Google Scholar
Digital Library
- G. Li, R. Palmer, M. DeLisi, G. Gopalakrishnan, and R. M. Kirby. Formal specification of MPI 2.0: Case study in specifying a practical concurrent programming API. Sci. Comput. Program., 76(2):65–81, Feb. 2011. ISSN 0167-6423. Google Scholar
Digital Library
- A. Linden and P. Wolper. A verification-based approach to memory fence insertion in PSO memory systems. In Tools and Algorithms for the Construction and Analysis of Systems - 19th International Conference, TACAS 2013, 2013. Google Scholar
Digital Library
- F. Liu, N. Nedev, N. Prisadnikov, M. T. Vechev, and E. Yahav. Dynamic synthesis for relaxed memory models. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, 2012. Google Scholar
Digital Library
- Y. Meshman, A. M. Dan, M. T. Vechev, and E. Yahav. Synthesis of memory fences via refinement propagation. In Static Analysis - 21st International Symposium, SAS 2014, 2014.Google Scholar
- B. Norris and B. Demsky. CDSchecker: checking concurrent data structures written with C/C++ atomics. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, 2013. Google Scholar
Digital Library
- R. W. Numrich and J. Reid. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1–31, 1998. Google Scholar
Digital Library
- OpenFabrics Alliance (OFA). OpenFabrics Enterprise Distribution (OFED) www.openfabrics.org, 2014.Google Scholar
- S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In Theorem Proving in Higher Order Logics, 22nd International Conference, TPHOLs 2009, Munich, Germany, August 17-20, 2009. Proceedings, 2009. Google Scholar
Digital Library
- C.-S. Park, K. Sen, P. Hargrove, and C. Iancu. Efficient data race detection for distributed memory parallel programs. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 51:1–51:12, New York, NY, USA, 2011. ACM. ISBN 978-1- 4503-0771-0. C. S. Park, K. Sen, and C. Iancu. Scaling data race detection for partitioned global address space programs. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS ’13, pages 47–58, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2130-3. doi: 10.1145/2464996.2465000. Google Scholar
Digital Library
- M. Poke and T. Hoefler. Dare: High-performance state machine replication on rdma networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’15, pages 107–118, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3550-8. doi: 10.1145/ 2749246.2749267. Google Scholar
Digital Library
- V. Saraswat, G. Almasi, G. Bikshandi, C. Cascaval, D. Cunningham, D. Grove, S. Kodali, I. Peshansky, and O. Tardieu. The asynchronous partitioned global address space model. In AMP ’10: Proceedings of The First Workshop on Advances in Message Passing, June 2010.Google Scholar
- S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWER multiprocessors. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011. Google Scholar
Digital Library
- C. SPARC International, Inc. The SPARC Architecture Manual: Version 8. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1992. ISBN 0-13-825001-4. The InfiniBand Trade Association. Infiniband Architecture Spec. Vol. 1, Rel. 1.2. InfiniBand Trade Association, 2004. Google Scholar
Digital Library
- E. Torlak, M. Vaziri, and J. Dolby. Memsat: checking axiomatic specifications of memory models. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, 2010. Google Scholar
Digital Library
- UPC Consortium. UPC language specifications, v1.2. Technical report, Lawrence Berkeley National Laboratory, 2005. LBNL- 59208.Google Scholar
- M. Valiev, E. J. Bylaska, N. Govind, K. Kowalski, T. P. Straatsma, H. J. Van Dam, D. Wang, J. Nieplocha, E. Apra, T. L. Windus, et al. NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Computer Physics Communications, 181(9):1477–1489, 2010.Google Scholar
Cross Ref
Index Terms
Modeling and analysis of remote memory access programming
Recommendations
Modeling and analysis of remote memory access programming
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsRecent advances in networking hardware have led to a new generation of Remote Memory Access (RMA) networks in which processors from different machines can communicate directly, bypassing the operating system and allowing higher performance. Researchers ...
UMM: an operational memory model specification framework with integrated model checking capability: Research Articles
2002 ACM Java Grande—ISCOPE Conference Part IGiven the complicated nature of modern shared memory systems, it is vital to have a systematic approach to specifying and analyzing memory consistency requirements. In this paper, we present the UMM specification framework, which integrates two key ...
Formal specification of the OpenMP memory model
IWOMP'05/IWOMP'06: Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programmingOpenMP [1] is an important API for shared memory programming, combining shared memory's potential for performance with a simple programming interface. Unfortunately, OpenMP lacks a critical tool for demonstrating whether programs are correct: a formal ...







Comments