skip to main content
research-article
Public Access

Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics

Published:11 June 2018Publication History
Skip Abstract Section

Abstract

This paper introduces a new approach to building distributed-memory graph analytics systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies, and programming models. The key to this approach is Gluon, a communication-optimizing substrate.

Programmers write applications in a shared-memory programming system of their choice and interface these applications with Gluon using a lightweight API. Gluon enables these programs to run on heterogeneous clusters and optimizes communication in a novel way by exploiting structural and temporal invariants of graph partitioning policies.

To demonstrate Gluon’s ability to support different programming models, we interfaced Gluon with the Galois and Ligra shared-memory graph analytics systems to produce distributed-memory versions of these systems named D-Galois and D-Ligra, respectively. To demonstrate Gluon’s ability to support heterogeneous processors, we interfaced Gluon with IrGL, a state-of-the-art single-GPU system for graph analytics, to produce D-IrGL, the first multi-GPU distributed-memory graph analytics system.

Our experiments were done on CPU clusters with up to 256 hosts and roughly 70,000 threads and on multi-GPU clusters with up to 64 GPUs. The communication optimizations in Gluon improve end-to-end application execution time by ∼2.6× on the average. D-Galois and D-IrGL scale well and are faster than Gemini, the state-of-the-art distributed CPU graph analytics system, by factors of ∼3.9× and ∼4.9×, respectively, on the average.

Skip Supplemental Material Section

Supplemental Material

p752-dathathri.webm

References

  1. 2010. Graph 500 Benchmarks. http://www.graph500.orgGoogle ScholarGoogle Scholar
  2. 2013. Apache Giraph. http://giraph.apache.org/Google ScholarGoogle Scholar
  3. 2018. The Galois System. http://iss.ices.utexas.edu/?p=projects/galoisGoogle ScholarGoogle Scholar
  4. 2018. The Lonestar Benchmark Suite. http://iss.ices.utexas.edu/?p=projects/galois/lonestarGoogle ScholarGoogle Scholar
  5. 2018. Pittsburgh Supercomputing Center (PSC). https://www.psc.edu/Google ScholarGoogle Scholar
  6. 2018. Texas Advanced Computing Center (TACC), The University of Texas at Austin. https://www.tacc.utexas.edu/Google ScholarGoogle Scholar
  7. Amine Abou-Rjeili and George Karypis. 2006. Multilevel Algorithms for Partitioning Power-law Graphs. In Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS'06). IEEE Computer Society, Washington, DC, USA, 124-124. http://dl.acm.org/citation.cfm?id=1898953.1899055 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tal Ben-Nun, Michael Sutton, Sreepathi Pai, and Keshav Pingali. 2017. Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations. In Proceedings of the 22Nd ACMSIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '17). ACM, New York, NY, USA, 235-248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th international conference on World Wide Web, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM Press, 587-596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, 595-601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. G. Boman, K. D. Devine, and S. Rajamanickam. 2013. Scalable matrix computations on large scale-free graphs using 2D graph partitioning. In 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 1-12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Florian Bourse, Marc Lelarge, and Milan Vojnovic. 2014. Balanced Graph Edge Partition. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14). ACM, New York, NY, USA, 1456-1465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Aydin Buluc and John R Gilbert. 2011. The Combinatorial BLAS: Design, Implementation, and Applications. Int. J. High Perform. Comput. Appl. 25, 4 (Nov. 2011), 496-509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Burtscher, R. Nasre, and K. Pingali. 2012. A quantitative study of irregular programs on GPUs. In Workload Characterization (IISWC), 2012 IEEE International Symposium on. 141-151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ümit V. Çatalyürek, Cevdet Aykanat, and Bora Uçar. 2010. On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe. SIAM J. Sci. Comput. 32, 2 (Feb. 2010), 656-683. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A Recursive Model for Graph Mining. 442-446.Google ScholarGoogle Scholar
  17. Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15). ACM, New York, NY, USA, Article 1, 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Unnikrishnan Cheramangalath, Rupesh Nasre, and Y. N. Srikant. 2016. Falcon: A Graph Manipulation Language for Heterogeneous Systems. TACO 12, 4 (2016), 54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein (Eds.). 2001. Introduction to Algorithms. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hoang-Vu Dang, Roshan Dathathri, Gurbinder Gill, Alex Brooks, Nikoli Dryden, Andrew Lenharth, Loc Hoang, Keshav Pingali, and Marc Snir. 2018. A Lightweight Communication Runtime for Distributed Graph Analytics. In International Parallel and Distributed Processing Symposium (IPDPS).Google ScholarGoogle ScholarCross RefCross Ref
  21. Erich Elsen and Vishal Vaidyanathan. 2014. VertexAPI2 - A Vertex-Program API for Large Graph Computations on the GPU. (2014). www.royal-caliber.com/vertexapi2.pdfGoogle ScholarGoogle Scholar
  22. Abdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto, and Matei Ripeanu. 2012. A Yoke of Oxen and a Thousand Chickens for Heavy Lifting Graph Processing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '12). ACM, 345-354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 17-30. http://dl.acm.org/citation.cfm?id=2387880.2387883 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Han, D. Mawhirter, B. Wu, and M. Buland. 2017. Graphie: Large-Scale Asynchronous Graph Traversals on Just a GPU. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 233-245.Google ScholarGoogle Scholar
  25. Muhammad Amber Hassaan, Martin Burtscher, and Keshav Pingali. 2011. Ordered vs unordered: a comparison of parallelism and workefficiency in irregular algorithms. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming (PPoPP '11). ACM, New York, NY, USA, 3-12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Hong, A. Sukumaran-Rajam, J. Kim, and P. Sadayappan. 2017. Multi-Graph: Efficient Graph Processing on GPUs. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 27-40.Google ScholarGoogle Scholar
  27. Sungpack Hong, Siegfried Depner, Thomas Manhardt, Jan Van Der Lugt, Merijn Verstraaten, and Hassan Chafi. 2015. PGX.D: A Fast Distributed Graph Processing Engine. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, Article 58, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. 2011. Accelerating CUDA Graph Algorithms at Maximum Warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP '11). ACM, New York, NY, USA, 267-276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Imranul Hoque and Indranil Gupta. 2013. LFGraph: Simple and Fast Distributed Graph Analytics. In Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems (TRIOS '13). ACM, New York, NY, USA, Article 9, 17 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nilesh Jain, Guangdeng Liao, and Theodore L. Willke. 2013. Graph-Builder: Scalable Graph ETL Framework. In First International Workshop on Graph Data Management Experiences and Systems (GRADES '13). ACM, New York, NY, USA, Article 4, 6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. George Karypis and Vipin Kumar. 1998. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 20, 1 (Dec. 1998), 359-392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. George Karypis and Vipin Kumar. 1999. Multilevel K-way Hypergraph Partitioning. In Proceedings of the 36th Annual ACM/IEEE Design Automation Conference (DAC '99). ACM, New York, NY, USA, 343-348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, and Panos Kalnis. 2013. Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 169-182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric Graph Processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC '14). ACM, New York, NY, USA, 239-252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Min-Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, and Jinwook Kim. 2016. GTS: A Fast and Scalable Graph Processing Method Based on Streaming Topology to GPUs. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, 447-461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference onWorld WideWeb (WWW'10). ACM, New York, NY, USA, 591-600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 31-46. http://dl.acm.org/citation.cfm?id=2387880.2387884 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Monica S. Lam, Stephen Guo, and Jiwon Seo. 2013. SociaLite: Datalog Extensions for Efficient Social Network Analysis. In Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013) (ICDE '13). IEEE Computer Society, Washington, DC, USA, 278-289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Michael LeBeane, Shuang Song, Reena Panda, Jee Ho Ryoo, and Lizy K. John. 2015. Data Partitioning Strategies for Graph Workloads on Heterogeneous Clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, Article 56, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker Graphs: An Approach to Modeling Networks. J. Mach. Learn. Res. 11 (March 2010), 985-1042. http://dl.acm.org/citation.cfm?id=1756006.1756039 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA, 195- 207. https://www.usenix.org/conference/atc17/technical-sessions/presentation/ma Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a Trillion-Edge Graph on a Single Machine. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys '17). ACM, New York, NY, USA, 527-543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD Intl Conf. on Management of Data (SIGMOD '10). 135-146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Frank McSherry, Michael Isard, and Derek G. Murray. 2015. Scalability! But at What Cost?. In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems (HOTOS'15). USENIX Association, Berkeley, CA, USA, 14-14. http://dl.acm.org/citation.cfm?id=2831090.2831104 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mario Mendez-Lojo, Augustine Mathew, and Keshav Pingali. 2010. Parallel Inclusion-based Points-to Analysis. In Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'10). http://iss.ices.utexas.edu/Publications/Papers/oopsla10-mendezlojo.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU Graph Traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '12). ACM, New York, NY, USA, 117-128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2012. Web Data Commons - Hyperlink Graphs. http://webdatacommons.org/hyperlinkgraph/Google ScholarGoogle Scholar
  48. Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2014. Graph Structure in the Web -- Revisited: A Trick of the Heavy Tail. In Proceedings of the 23rd International Conference on World Wide Web (WWW'14 Companion). ACM, New York, NY, USA, 427-432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Rupesh Nasre, Martin Burtscher, and Keshav Pingali. 2013. Atomic-free irregular computations on GPUs. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6). ACM, New York, NY, USA, 96-107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Rupesh Nasre, Martin Burtscher, and Keshav Pingali. 2013. Data-driven versus Topology-driven Irregular Computations on GPUs. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS '13). Springer-Verlag, London, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Rupesh Nasre, Martin Burtscher, and Keshav Pingali. 2013. Morph Algorithms on GPUs. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '13). ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-tolerant Software Distributed Shared Memory. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '15). USENIX Association, Berkeley, CA, USA, 291-305. http://dl.acm.org/citation.cfm?id=2813767.2813789 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 456-471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Nicholas A. Nystrom, Michael J. Levine, Ralph Z. Roskies, and J. Ray Scott. 2015. Bridges: A Uniquely Flexible HPC Resource for New Communities and Data Analytics. In Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure (XSEDE '15). ACM, New York, NY, USA, Article 30, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Sreepathi Pai and Keshav Pingali. 2016. A Compiler for Throughput Optimization of Graph Algorithms on GPUs. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). ACM, New York, NY, USA, 1-19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2017. Multi-GPU Graph Analytics. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 479-490.Google ScholarGoogle Scholar
  57. Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, and Giorgio Iacoboni. 2015. HDRF: Stream-Based Partitioning for Power-Law Graphs. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM '15). ACM, New York, NY, USA, 243-252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The TAO of parallelism in algorithms. In Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '11). 12-25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. The Lemur Project. 2013. The ClueWeb12 Dataset. http://lemurproject.org/clueweb12/Google ScholarGoogle Scholar
  60. Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out Graph Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15). ACM, New York, NY, USA, 410-424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. XStream: Edge-centric Graph Processing Using Streaming Partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 472-488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Julian Shun and Guy E. Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '13). 135- 146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Isabelle Stanton and Gabriel Kliot. 2012. Streaming Graph Partitioning for Large Distributed Graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '12). ACM, New York, NY, USA, 1222-1230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Dan Stanzione, Bill Barth, Niall Gaffney, Kelly Gaither, Chris Hempel, Tommy Minyard, S. Mehringer, Eric Wernert, H. Tufo, D. Panda, and P. Teller. 2017. Stampede 2: The Evolution of an XSEDE Supercomputer. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact (PEARC17). ACM, New York, NY, USA, Article 15, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D Peterson, et al. 2014. XSEDE: accelerating scientific discovery. Computing in Science & Engineering 16, 5 (2014), 62-74.Google ScholarGoogle ScholarCross RefCross Ref
  66. Charalampos Tsourakakis, Christos Gkantsidis, Bozidar Radunovic, and Milan Vojnovic. 2014. FENNEL: Streaming Graph Partitioning for Massive Scale Graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (WSDM '14). ACM, New York, NY, USA, 333-342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103-111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Keval Vora, Sai Charan Koduru, and Rajiv Gupta. 2014. ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms Using a Relaxed Consistency Based DSM. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA '14). ACM, New York, NY, USA, 861-878. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A High-performance Graph Processing Library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). ACM, New York, NY, USA, Article 11, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. 2015. GraM: Scaling Graph Computation to the Trillions. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC '15). ACM, New York, NY, USA, 408-421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Wencong Xiao, Jilong Xue, Youshan Miao, Zhen Li, Cheng Chen, Ming Wu, Wei Li, and Lidong Zhou. 2017. Tux2: Distributed Graph Computation for Machine Learning. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 669-682. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/xiao Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013. GraphX: A Resilient Distributed Graph System on Spark. In First International Workshop on Graph Data Management Experiences and Systems (GRADES '13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Kaiyuan Zhang, Rong Chen, and Haibo Chen. 2015. NUMA-aware Graph-structured Analytics. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015). ACM, New York, NY, USA, 183-193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified Graph Processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25, 6 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-centric Distributed Graph Processing System. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 301-316. http://dl.acm.org/citation.cfm?id=3026877.3026901 Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, Santa Clara, CA, 375-386. https://www.usenix.org/conference/atc15/technical-session/presentation/zhu Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!