skip to main content
research-article

ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM

Authors Info & Claims
Published:15 October 2014Publication History
Skip Abstract Section

Abstract

Many vertex-centric graph algorithms can be expressed using asynchronous parallelism by relaxing certain read-after-write data dependences and allowing threads to compute vertex values using stale (i.e., not the most recent) values of their neighboring vertices. We observe that on distributed shared memory systems, by converting synchronous algorithms into their asynchronous counterparts, algorithms can be made tolerant to high inter-node communication latency. However, high inter-node communication latency can lead to excessive use of stale values causing an increase in the number of iterations required by the algorithms to converge. Although by using bounded staleness we can restrict the slowdown in the rate of convergence, this also restricts the ability to tolerate communication latency. In this paper we design a relaxed memory consistency model and consistency protocol that simultaneously tolerate communication latency and minimize the use of stale values. This is achieved via a coordinated use of best effort refresh policy and bounded staleness. We demonstrate that for a range of asynchronous graph algorithms and PDE solvers, on an average, our approach outperforms algorithms based upon: prior relaxed memory models that allow stale values by at least 2.27x; and Bulk Synchronous Parallel (BSP) model by 4.2x. We also show that our approach frequently outperforms GraphLab, a popular distributed graph processing framework.

References

  1. Apache Giraph. http://giraph.apache.org/.Google ScholarGoogle Scholar
  2. M. Ahamad, G. Neiger, J.E. Burns, P. Kohli, and P.W. Hutto. Causal Memory: Definitions, Implementation and Programming. phDistributed Computing, 9 (1): 37--49, 1995.Google ScholarGoogle Scholar
  3. C. Amza, A.L. Cox, W. Zwaenepoel, and S. Dwarkadas. Software DSM Protocols That Adapt Between Single Writer and Multiple Writer. phHPCA, pages 261--271, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Bourchtein. Atmospheric models. http://www.cise.ufl.edu/research/sparse/matrices/Bourchtein/atmosmodl.html, 2009.Google ScholarGoogle Scholar
  5. H.E. Bal, M.F. Kaashoek, and A.S. Tanenbaum. Orca: A Language for Parallel Programming of Distributed Systems. phIEEE TSE, 18 (3): 190--205, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G.M. Baudet. Asynchronous Iterative Methods for Multiprocessors. phJACM, 25 (2): 226--244, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B.N. Bershad and M.J. Zekauskas. Midway: Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors. phTR, Carnegie Mellon University-CS-91--170, 1991.Google ScholarGoogle Scholar
  8. J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Implementation and Performance of Munin. phSOSP, pages 152--164, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for Reducing Consistency-related Communication in Distributed Shared-memory Systems. phACM TOCS, 13 (3): 205--243, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Chaiken, C. Fields, K. Kurihara, and A. Agarwal. Directory-Based Cache Coherence in Large-Scale Multiprocessors. phComputer, 23 (6): 49--58, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Chen, C. Tang, B. Sanders, S. Dwarkadas, and M.L. Scott. Exploiting High-level Coherence Information to Optimize Distributed Shared State. phPPoPP, pages 131--142, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y-S. Cheng, M. Neely, and K. M. Chugg. Iterative Message Passing Algorithm for Bipartite Maximum Weighted Matching. In phIEEE International Symposium on Information Theory, pages 1934--1938. 2006.Google ScholarGoogle Scholar
  13. J. Cipar, Q. Ho, J. K. Kim, S. Lee, G. R. Ganger, G. Gibson, K. Keeton, and E. Xing. Solving the Straggler Problem with Bounded Staleness. phHotOS, pages 22--22, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. L. M. D. Chazan. Chaotic relaxation. In phLinear Algebra and Its Application, pages 2:199--222, 1969.Google ScholarGoogle Scholar
  15. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. phCACM, 51 (1): 107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Dziekonski, A. Lamecki, and M. Mrozowski. High-order vector finite element method in EM. http://www.cise.ufl.edu/research/sparse/matrices/Dziekonski/dielFilterV3real.html, 2011.Google ScholarGoogle Scholar
  17. C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software Behavior Oriented Parallelization. phPLDI, pages 223--234, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Janna, and M. Ferronato. 3D model of a steel flange, hexahedral finite elements. http://www.cise.ufl.edu/research/sparse/matrices/Janna/Flan\_1565.html, 2011.Google ScholarGoogle Scholar
  19. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-memory Multiprocessors. phISCA, pages 15--26, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. phOSDI, pages 17--30, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. R. Goodman. phCache Consistency and Sequential Consistency. Univ. of Wisconsin-Madison, CS Department, 1991.Google ScholarGoogle Scholar
  22. A. Heddaya and H. Sinha. An overview of Mermera: A system and formalism for non-coherent distributed parallel memory. phHawaii International Conf. on System Sciences, vol. 2, pages 164--173, 1993.Google ScholarGoogle Scholar
  23. A. Heddaya and H. Sinha. phCoherence, Non-coherence and Local Consistency in Distributed Shared Memory for Parallel Computing. TR BU-CS-92-004, Boston Univ., 1992.Google ScholarGoogle Scholar
  24. A. Heddaya and H. Sinha. phAn Implementation of Mermera: A Shared Memory System that Mixes Coherence with Non-coherence. TR BUCS-TR-1993-006, Boston Univ., 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. De Domenico, A. Lima, P. Mougel, and M. Musolesi. The Anatomy of a Scientific Rumor. Scientific Reports, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  26. P. W. Hutto and M. Ahamad. Slow memory: Weakening consistency to enhance concurrency in distributed shared memories. phICDCS, pages 302--309, 1990.Google ScholarGoogle Scholar
  27. L. Iftode, J. P. Singh, and K. Li. Scope Consistency: A Bridge Between Release Consistency and Entry Consistency. phSPAA, pages 277--287, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Iosevich and A. Schuster. Distributed Shared Memory: To Relax or Not to Relax? In M. Danelutto, M. Vanneschi, and D. Laforenza, editors, phEuro-Par, phLNCS 3149, pages 198--205, Springer, 2004.Google ScholarGoogle Scholar
  29. U. Kang, D. Horng, et al. Inference of Beliefs on Billion-Scale Graphs. phKDD-LDMTA, 2010.Google ScholarGoogle Scholar
  30. G. Karypis, V. Kumar. A Fast and Highly Quality Multilevel Scheme for Partitioning Irregular Graphs. phSIAM Journal on Scientific Computing, Vol. 20, pp. 359--392, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Keleher, A.L. Cox, and W. Zwaenepoel. phLazy Release Consistency for Software Distributed Shared Memory, phISCA, pages 13--21, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. phWTEC.pages 10--10, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. SC. Koduru, M. Feng, and R. Gupta. Programming Large Dynamic Data Structures on a DSM Cluster of Multicores. phPGAS Programming Models, 2013.Google ScholarGoogle Scholar
  34. L. Kontothanassis, R. Stets, G. Hunt, U. Rencuzogullari, G. Altekar, S. Dwarkadas, and M.L. Scott. Shared Memory Computing on Clusters with Symmetric Multiprocessors and System Area Networks. phTOCS, 23 (3): 301--335, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Kristensen and C. Low. Problem-oriented Object Memory: Customizing Consistency. phOOPSLA, pages 399--413, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L.P. Chew. Optimistic Parallelism Requires Abstractions. phPLDI, pages 211--222, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Kyrola, G. Blelloch, C. Guestrin GraphChi: Large-scale Graph Computation on Just a PC. phOSDI, pages 31--46, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. phIEEE TC, C-28 (9): 690--691, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. phCACM, 21 (7): 558--565, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. phThe Directory-based Cache Coherence Protocol for the DASH Multiprocessor, phISCA, pages 148--159, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics, pages 29--123, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  42. I. Lipkind, I. Pechtchanski, and V. Karamcheti. Object Views: Language Support for Intelligent Object Caching in Parallel and Distributed Computations. phOOPSLA, pages 447--460, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. Lipton and J. Sandberg. phPRAM: A Scalable Shared Memory. Princeton University, Department of Computer Science, TR-180--88, 1988.Google ScholarGoogle Scholar
  44. L. Liu and Z. Li. Improving Parallelism and Locality with Asynchronous Algorithms. phPPoPP, pages 213--222, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. X. Liu and T. Murata. Advanced modularity-specialized label propagation algorithm for detecting communities in networks. phPhysica A: Statistical Mechanics and its Applications, 389 (7): 1493--1500, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  46. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. phProc. VLDB Endow., 5 (8): 716--727, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing.phSIGMOD, pages 135--146, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. R. Meyers and Z. Li. ASYNC Loop Constructs for Relaxed Synchronization. In phLCPC,phLanguages and Compilers for Parallel Computing, pages 292--303, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. D. Mosberger. Memory Consistency Models. phSIGOPS Oper. Syst. Rev., 27 (1): 18--26, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. D. Nguyen, L. Andrew and K. Pingali. A Lightweight Infrastructure for Graph Analytics. phSOSP, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. W-Y. Liang, C-T. King, and F. Lai. Adsmith: An Efficient Object-Based Distributed Shared Memory System on PVM. phInternational Symposium on Parallel Architectures, Algorithms, and Networks, pages 173--179, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. 1999.Google ScholarGoogle Scholar
  54. L. Takac and M. Zabovsky. Data analysis in public social networks. International Scientific Conference and International Workshop Present Day Trends of Innovations, 2012.Google ScholarGoogle Scholar
  55. L. Rauchwerger and D. A. Padua. The LRPD Test: Speculative Run-time Parallelization of Loops with Privatization and Reduction Parallelization. PLDI, 218--232, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. D. J. Scales and K. Gharachorloo. Design and Performance of the Shasta Distributed Shared Memory Protocol. phICS, pages 245--252, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. M. Schulz, J. Tao, and W. Karl. Improving the Scalability of Shared Memory Systems through Relaxed Consistency. phWC3, 2002.Google ScholarGoogle Scholar
  58. X. Shen, Arvind, and L. Rudolph. CACHET: An Adaptive Cache Coherence Protocol for Distributed Shared-memory Systems. phICS, pages 135--144, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. J. Shun, and G. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. phPPoPP, pages 135--146, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. A. Singla, U. Ramachandran, and J. Hodgins. Temporal Notions of Synchronization and Consistency in Beehive. phSPAA, pages 211--220, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. J. Leskovec. Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/index.html, 2011.Google ScholarGoogle Scholar
  62. C. Sinclair. 3-D spectral-element elastic wave modeling in freq. domain. http://www.cise.ufl.edu/research/sparse/matrices/Sinclair/3Dspectralwave.html, 2007.Google ScholarGoogle Scholar
  63. T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. phACM Transactions on Mathematical Software, Vol 38, pages 1:1 - 1:25, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. L. G. Valiant. A Bridging Model for Parallel Computation. phCACM, 33 (8): 103--111, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. J. Leskovec, L. A. Adamic, and B. A. Huberman. The Dynamics of Viral Marketing. ACM Trans. Web, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. G. Wang, W. Xie, A. Demers, and J. Gehrke. Asynchronous Large-Scale Graph Processing Made Easy. phCIDR, 2013.Google ScholarGoogle Scholar
  67. B.-H. Yu, Z. Huang, S. Cranefield, and M. Purvis. Homeless and Home-based Lazy Release Consistency Protocols on Distributed Shared Memory. phAustralasian Conf. on Computer Science-Vol 26, pages 117--123, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Y. Zhou, L. Iftode, J. P. Sing, K. Li, B. R. Toonen, I. Schoinas, M. D. Hill, and D. A. Wood. Relaxed Consistency and Coherence Granularity in DSM Systems: A Performance Evaluation. phPPoPP, pages 193--205, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. X. Zhu and Z. Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report Carnegie Mellon University-CALD-02--107,Carnegie Mellon University, 2002.Google ScholarGoogle Scholar

Index Terms

  1. ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!