skip to main content
10.1145/3491003.3491017acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdcnConference Proceedingsconference-collections
research-article
Public Access

Distributed Matrix Tiling Using A Hypergraph Labeling Formulation

Published:24 January 2022Publication History

ABSTRACT

Partitioning large matrices is an important problem in distributed linear algebra computing, used in ML among others. Briefly, our goal is to perform a sequence of matrix algebra operations in a distributed manner on these large matrices. However, not all partitioning schemes work well with different matrix algebra operations and their implementations (algorithms). This is a type of data tiling problem. In this paper we consider a data tiling problem using hypergraphs. We prove some hardness results and give a theoretical characterization of its complexity on random instances. Additionally we develop a greedy algorithm and experimentally show its efficacy.

References

  1. Amit Agarwal, Moses Charikar, Konstantin Makarychev, and Yury Makarychev. 2005. O (Math 104) approximation algorithms for min UnCut, min 2CNF deletion, and directed cut problems. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing. 573–581.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adi Avidor and Michael Langberg. 2007. The multi-multiway cut problem. Theoretical Computer Science 377, 1-3 (2007), 35–42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, and Oded Schwartz. 2012. Communication-optimal parallel algorithm for strassen’s matrix multiplication. In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures. 193–204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Grey Ballard, Alex Druinsky, Nicholas Knight, and Oded Schwartz. 2015. Brief announcement: Hypergraph partitioning for parallel sparse matrix-matrix multiplication. In Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures. 86–88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michael Bauer and Michael Garland. 2019. Legate NumPy: accelerated and distributed array computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing locality and independence with logical regions. In SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Steven R Brandt, Bita Hasheminezhad, Nanmiao Wu, Sayef Azad Sakin, Alex R Bigelow, Katherine E Isaacs, Kevin Huck, and Hartmut Kaiser. 2020. Distributed Asynchronous Array Computing with the JetLag Environment. In 2020 IEEE/ACM 9th Workshop on Python for High-Performance and Scientific Computing (PyHPC). IEEE, 49–57.Google ScholarGoogle Scholar
  8. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv:2005.14165 [cs.CL]Google ScholarGoogle Scholar
  9. Jaeyoung Choi, David W Walker, and Jack J Dongarra. 1994. PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers. Concurrency: Practice and Experience 6, 7 (1994), 543–570.Google ScholarGoogle ScholarCross RefCross Ref
  10. Bhaskar DasGupta, German Andres Enciso, Eduardo Sontag, and Yi Zhang. 2007. Algorithmic and complexity results for decompositions of biological networks into monotone subsystems. Biosystems 90, 1 (2007), 161–178.Google ScholarGoogle ScholarCross RefCross Ref
  11. Karen D Devine, Erik G Boman, Robert T Heaphy, Rob H Bisseling, and Umit V Catalyurek. 2006. Parallel hypergraph partitioning for scientific computing. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. IEEE, 10–pp.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805(2018). arxiv:1810.04805http://arxiv.org/abs/1810.04805Google ScholarGoogle Scholar
  13. Rong Gu, Yun Tang, Chen Tian, Hucheng Zhou, Guanru Li, Xudong Zheng, and Yihua Huang. 2017. Improving execution concurrency of large-scale matrix multiplication on distributed data-parallel platforms. IEEE Transactions on Parallel and Distributed Systems 28, 9 (2017), 2539–2552.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chien-Chin Huang, Qi Chen, Zhaoguo Wang, Russell Power, Jorge Ortiz, Jinyang Li, and Zhen Xiao. 2015. Spartan: A distributed array framework with smart tiling. In 2015 {USENIX} Annual Technical Conference ({USENIX}{ATC} 15). 1–15.Google ScholarGoogle Scholar
  15. Falk Hüffner, Nadja Betzler, and Rolf Niedermeier. 2007. Optimal edge deletions for signed graph balancing. In International Workshop on Experimental and Efficient Algorithms. Springer, 297–310.Google ScholarGoogle ScholarCross RefCross Ref
  16. George Karypis, Rajat Aggarwal, Vipin Kumar, and Shashi Shekhar. 1999. Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 7, 1(1999), 69–79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Subhash Khot. 2002. On the power of unique 2-prover 1-round games. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. 767–775.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Subhash Khot and Nisheeth K Vishnoi. 2005. On the unique games conjecture. In FOCS, Vol. 5. Citeseer, 3.Google ScholarGoogle Scholar
  19. Pat Langley. 1996. Elements of machine learning. Morgan Kaufmann.Google ScholarGoogle Scholar
  20. Chase Roberts, Ashley Milsted, Martin Ganahl, Adam Zalcman, Bruce Fontaine, Yijian Zou, Jack Hidary, Guifre Vidal, and Stefan Leichenauer. 2019. Tensornetwork: A library for physics and machine learning. arXiv preprint arXiv:1905.01330(2019).Google ScholarGoogle Scholar
  21. Dimitrios M Thilikos, Maria Serna, and Hans L Bodlaender. 2005. Cutwidth I: A linear time fixed parameter algorithm. Journal of Algorithms 56, 1 (2005), 1–24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R Tohid, Bibek Wagle, Shahrzad Shirzad, Patrick Diehl, Adrian Serio, Alireza Kheirkhahan, Parsa Amini, Katy Williams, Kate Isaacs, Kevin Huck, 2018. Asynchronous execution of python code on task-based runtime systems. In 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). IEEE, 37–45.Google ScholarGoogle ScholarCross RefCross Ref
  23. Mingxing Zhang, Yongwei Wu, Kang Chen, Teng Ma, and Weimin Zheng. 2016. Measuring and optimizing distributed array programs. Proceedings of the VLDB Endowment 9, 12 (2016), 912–923.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Distributed Matrix Tiling Using A Hypergraph Labeling Formulation
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICDCN '22: Proceedings of the 23rd International Conference on Distributed Computing and Networking
            January 2022
            298 pages
            ISBN:9781450395601
            DOI:10.1145/3491003

            Copyright © 2022 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 January 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)61
            • Downloads (Last 6 weeks)12

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format