skip to main content
research-article

A Reconfigurable Architecture for the Detection of Strongly Connected Components

Authors Info & Claims
Published:04 December 2015Publication History
Skip Abstract Section

Abstract

The Strongly Connected Components (SCCs) detection algorithm serves as a keystone for many graph analysis applications. The SCC execution time for large-scale graphs, as with many other graph algorithms, is dominated by memory latency. In this article, we investigate the design of a parallel hardware architecture for the detection of SCCs in directed graphs. We propose a design methodology that alleviates memory latency and problems with irregular memory access. The design is composed of 16 processing elements dedicated to parallel Breadth-First Search (BFS) and eight processing elements dedicated to finding intersection in parallel. Processing elements are organized to reuse resources and utilize memory bandwidth efficiently. We demonstrate a prototype of our design using the Convey HC-2 system, a commercial high-performance reconfigurable computing coprocessor. Our experimental results show a speedup of as much as 17× for detecting SCCs in large-scale graphs when compared to a conventional sequential software implementation.

References

  1. Virat Agarwal, Fabrizio Petrini, Davide Pasetto, and David A. Bader. 2010. Scalable graph exploration on multicore processors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gustavo Alonso. 2013. Hardware killed the software star. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 1--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Srinivas Aluru and Nagakishore Jammula. 2014. A review of hardware acceleration for computational genomics. IEEE Design & Test 31, 1 (Feb. 2014), 19--30.Google ScholarGoogle ScholarCross RefCross Ref
  4. Osama G. Attia, Tyler Johnson, Kevin Townsend, Phillip Jones, and Joseph Zambreno. 2014. CyGraph: A reconfigurable architecture for parallel breadth-first search. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing Workshops (IPDPSW). 228--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Werner Augustin, Jan-Philipp Weiss, and Vincent Heuveline. 2011. Convey HC-1 hybrid core computer the potential of FPGAs in numerical simulation. In Proceedings of the International Workshop on New Frontiers in High-Performance and Hardware-Aware Computing (HipHaC).Google ScholarGoogle Scholar
  6. David A. Bader and Kamesh Madduri. 2006. GTgraph: A Suite of Synthetic Graph Generators. Retrieved from www.cse.psu.edu/∼madduri/software/GTgraph.Google ScholarGoogle Scholar
  7. Jason D. Bakos. 2010. High-performance heterogeneous computing with the convey HC-1. Computing in Science & Engineering 12, 6 (Nov. 2010), 80--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Scott Beamer, Aydin Buluç, Krste Asanovic, and David Patterson. 2013. Distributed memory breadth-first search revisited: Enabling bottom-up search. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium Workshops & PhD Forum (IPDPSW). 1618--1627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brahim Betkaoui, Yu Wang, David B. Thomas, and Wayne Luk. 2012a. A reconfigurable computing approach for efficient and scalable parallel graph exploration. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP). 8--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brahim Betkaoui, Yu Wang, David B. Thomas, and Wayne Luk. 2012b. Parallel FPGA-based all pairs shortest paths for sparse networks: A human brain connectome case study. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). 99--104.Google ScholarGoogle ScholarCross RefCross Ref
  11. Shekhar Borkar and Andrew A. Chien. 2011. The future of microprocessors. Communications of the ACM 54, 5 (May 2011), 67--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tony M. Brewer. 2010. Instruction set innovations for the Convey HC-1 computer. IEEE Micro 30, 2 (2010), 70--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the SIAM International Conference on Data Mining (SDM). Chapter 43, 442--446.Google ScholarGoogle Scholar
  14. Fabio Checconi, Fabrizio Petrini, Jeremiah Willcock, Andrew Lumsdaine, Anamitra Roy Choudhury, and Yogish Sabharwal. 2012. Breaking the speed and scalability barriers for graph exploration on distributed-memory machines. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd. ed.). MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). 1223--1231.Google ScholarGoogle Scholar
  17. Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2010. Walking in facebook: A case study of unbiased sampling of OSNs. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. 2011a. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). 267--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun. 2011b. Efficient parallel graph exploration on multi-core CPU and GPU. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 78--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sungpack Hong, Nicole C. Rodia, and Kunle Olukotun. 2013. On fast parallel detection of strongly connected components (SCC) in small-world graphs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christopher T. Johnston and Donald G. Bailey. 2008. FPGA implementation of a single pass connected components algorithm. In Proceedings of the IEEE International Symposium on Electronic Design, Test and Applications (DELTA 2008). 228--231.Google ScholarGoogle Scholar
  22. Michael J. Klaiber, Donald G. Bailey, Silvia Ahmed, Yousef Baroud, and Sven Simon. 2013. A high-throughput FPGA architecture for parallel connected components analysis based on label reuse. In Proceedings of the International Conference on Field-Programmable Technology (FPT). 302--305.Google ScholarGoogle ScholarCross RefCross Ref
  23. Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan Berry. 2007. Challenges in parallel graph processing. In Parallel Processing Letters, Vol. 17. 5--20.Google ScholarGoogle ScholarCross RefCross Ref
  24. Oliver Mason and Mark Verwoerd. 2007. Graph theory and networks in biology. In IET Systems Biology, Vol. 1. 89--119.Google ScholarGoogle ScholarCross RefCross Ref
  25. Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU graph traversal. In Proceedings of the 17th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, and James A. Ang. 2010. Introducing the graph 500. In Cray Users Group (CUG).Google ScholarGoogle Scholar
  27. Krishna K. Nagar and Jason D. Bakos. 2011. A sparse matrix personality for the convey HC-1. In Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Corey B. Olson, Maria Kim, Cooper Clauson, Boris Kogon, Carl Ebeling, Scott Hauck, and Walter L. Ruzzo. 2012. Hardware acceleration of short read mapping. In Proceedings of hte IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 161--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Micha Sharir. 1981. A strong-connectivity algorithm and its applications in data flow analysis. Computers & Mathematics with Applications 7, 1 (1981), 67--72.Google ScholarGoogle Scholar
  30. Julian Shun and Guy E. Blelloch. 2014. A simple parallel Cartesian tree algorithm and its application to parallel suffix tree construction. ACM Transactions on Parallel Computing 1, 1, Article 8 (Oct. 2014), 8:1--8:20 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. George M. Slota, Sivasankaran Rajamanickam, and Kamesh Madduri. 2014. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS). 550--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM Journal on Computing 1, 2 (1972), 146--160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kevin Townsend and Joseph Zambreno. 2013. Reduce, reuse, recycle (R3): A design methodology for sparse matrix vector multiplication on reconfigurable platforms. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors (ASAP). 185--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kevin Wadleigh, John Amelio, Kirby Collins, and Glen Edwards. 2012. Hybrid breadth first search implementation for hybrid-core computers. In SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC). 1354--1354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Qingbo Wang, Weirong Jiang, Yinglong Xia, and Viktor Prasanna. 2010. A message-passing multi-softcore architecture on FPGA for breadth-first search. In Proceedings of the International Conference on Field-Programmable Technology (FPT). 70--77.Google ScholarGoogle ScholarCross RefCross Ref
  36. Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems (TPDS) 25, 6 (June 2014), 1543--1552. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

(auto-classified)
  1. A Reconfigurable Architecture for the Detection of Strongly Connected Components

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!