skip to main content
research-article

Efficient Branch and Bound on FPGAs Using Work Stealing and Instance-Specific Designs

Published:27 June 2017Publication History
Skip Abstract Section

Abstract

Branch and bound (B8B) algorithms structure the search space as a tree and eliminate infeasible solutions early by pruning subtrees that cannot lead to a valid or optimal solution. Custom hardware designs significantly accelerate the execution of these algorithms. In this article, we demonstrate a high-performance B8B implementation on FPGAs. First, we identify general elements of B8B algorithms and describe their implementation as a finite state machine. Then, we introduce workers that autonomously cooperate using work stealing to allow parallel execution and full utilization of the target FPGA. Finally, we explore advantages of instance-specific designs that target a specific problem instance to improve performance.

We evaluate our concepts by applying them to a branch and bound problem, the reconstruction of corrupted AES keys obtained from cold-boot attacks. The evaluation shows that our work stealing approach is scalable with the available resources and provides speedups proportional to the number of workers. Instance-specific designs allow us to achieve an overall speedup of 47 × compared to the fastest implementation of AES key reconstruction so far. Finally, we demonstrate how instance-specific designs can be generated just-in-time such that the provided speedups outweigh the additional time required for design synthesis.

References

  1. Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exper. 23, 2 (2011), 187--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Tarun Beri, B. Sorav Bansal, and C. Subodh Kumar. 2015. Locality aware work-stealing based scheduling in hybrid CPU-GPU clusters. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’15). 48.Google ScholarGoogle Scholar
  3. Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5, 720--748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Joan Daemen and Vincent Rijmen. 2000. The block cipher rijndael. In Proceedings of the International Conference on Smart Card Research and Applications (CARDIS’00). Springer, 277--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. John D. Davis, Zhangxi Tan, Fang Yu, and Lintao Zhang. 2008. A practical reconfigurable hardware accelerator for boolean satisfiability solvers. In Proceedings of the Design Automation Conference (DAC’08). ACM, 780--785. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Vinoth Krishnan Elangovan, Rosa.M. Badia, and Eduard Ayguad. 2014. Scalability and parallel execution of OmpSs-OpenCL tasks on heterogeneous CPU-GPU environment. In Proceedings of the International Conference on Supercomputing (ISC’14). 141--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Guo, J. Zhao, V. Cave, and V. Sarkar. 2010. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In Proceedings of the International Symposium on Parallel and Distributed Processing (IPDPS’10). 1--12.Google ScholarGoogle Scholar
  8. J. Alex Halderman, Seth D. Schoen, Nadia Heninger, William Clarkson, William Paul, Joseph A. Calandrino, Ariel J. Feldman, Jacob Appelbaum, and Edward W. Felten. 2009. Lest we remember: Cold-boot attacks on encryption keys. Commun. ACM 52, 5, 91--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Youssef Hamadi and David Merceron. 1997. Reconfigurable architectures: A new vision for optimization problems. In Principles and Practice of Constraint Programming—CP97. Springer, 209--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brian Kaplan and Matthew Geiger. 2007. RAM Is Key: Extracting Disk Encryption Keys from Volatile Memory. Master’s thesis. Carnegie Mellon University.Google ScholarGoogle Scholar
  11. Vladimir Kašík. 2010. Acceleration of backtracking algorithm with FPGA. In Proceedings of the International Conference on Applied Electronics (AE’10). IEEE, 1--4.Google ScholarGoogle Scholar
  12. Srinidhi Kestur, John D. Davis, and Eric S. Chung. 2012. Towards a universal FPGA matrix-vector multiplication architecture. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). IEEE Computer Society, 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Kumar, A. Sbrlea, A. Jayaraj, Z. Budimli, D. Majeti, and V. Sarkar. 2015. Heterogeneous work-stealing across CPU and DSP cores. In Proceedings of the High Performance Extreme Computing Conference (HPEC’15). 1--6.Google ScholarGoogle Scholar
  14. A. H. Land and A. G. Doig. 1960. An automatic method of solving discrete programming problems. Econometrica 28, 3 (1960), 497--520.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. V. F. Lima, T. Gautier, N. Maillard, and V. Danjean. 2012. Exploiting concurrent GPU operations for efficient work stealing on multi-GPUs. In Proceedings of the IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’12). 75--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. João V. F. Lima, Thierry Gautier, Vincent Danjean, Bruno Raffin, and Nicolas Maillard. 2015. Design and analysis of scheduling strategies for multi-CPU and multi-GPU architectures. Parallel Comput. 44 (2015), 37--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Pavlos Malakonakis and Apostolos Dollas. 2011. Exploitation of parallel search space evaluation with fpgas in combinatorial problems: The eternity II case. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’11). IEEE, 264--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Melab, R. Leroy, M. Mezmaz, and D. Tuyttens. 2015. Parallel branch-and-bound using private IVM-based work stealing on xeon phi MIC coprocessor. In Proceedings of the International Conference on High Performance Computing 8 Simulation (HPCS’15). 394--399.Google ScholarGoogle Scholar
  19. Seung-Jai Min, Costin Iancu, and Katherine Yelick. 2011. Hierarchical work stealing on manycore clusters. In Proceedings of the Conference on Partitioned Global Address Space Programming Models (PGAS’11).Google ScholarGoogle Scholar
  20. Angeles Navarro, Antonio Vilches, Francisco Corbera, and Rafael Asenjo. 2014. Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures. J. Supercomput. 70, 2 (2014), 756--771. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Oliver Pell and Vitali Averbukh. 2012. Maximum performance computing with dataflow engines. IEEE Comput. Sci. Eng. 14 (2012), 98--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Christian Plessl and Marco Platzner. 2003. Instance-specific accelerators for minimum covering. J. Supercomput. 26, 2 (2003), 109--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nadesh Ramanathan, John Wickerson, Felix Winterstein, and George A. Constantinides. 2016. A case for work-stealing on FPGAs with OpenCL atomics. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’16). ACM, New York, 48--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Heinrich Riebler, Tobias Kenter, Christian Plessl, and Christoph Sorge. 2014. Reconstructing AES key schedules from decayed memory with FPGAs. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’14). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Heinrich Riebler, Tobias Kenter, Christoph Sorge, and Christian Plessl. 2013. FPGA-accelerated key search for cold-boot attacks against AES. In Proceedings of the International Conference on Field Programmable Technology (ICFPT’13). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  26. Iouliia Skliarova and António B. Ferrari. 2004. A software/reconfigurable hardware SAT solver. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 12, 4 (2004), 408--419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Alex Tsow. 2009. An improved recovery algorithm for decayed AES key schedule images. In Proceedings of the International Workshop on Selected Areas in Cryptography (SAC’09). 215--230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Qichao Wang. 2012. Localization and Extraction of Cryptographic Keys from Memory Images and Data Streams. Master’s thesis. University of Paderborn.Google ScholarGoogle Scholar
  29. Makoto Yokoo, Takayuki Suyama, and Hiroshi Sawada. 1996. Solving satisfiability problems using field programmable gate arrays: First results. In Principles and Practice of Constraint Programming—CP96. Springer, 497--509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Peixin Zhong, Margaret Martonosi, Sharad Malik, and Pranav Ashar. 1997. Implementing boolean satisfiability in configurable hardware. In Proceedings of the Logic Synthesis Workshop. Citeseer.Google ScholarGoogle Scholar

Index Terms

  1. Efficient Branch and Bound on FPGAs Using Work Stealing and Instance-Specific Designs

                      Recommendations

                      Comments

                      Login options

                      Check if you have access through your login credentials or your institution to get full access on this article.

                      Sign in

                      Full Access

                      • Article Metrics

                        • Downloads (Last 12 months)4
                        • Downloads (Last 6 weeks)0

                        Other Metrics

                      PDF Format

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader
                      About Cookies On This Site

                      We use cookies to ensure that we give you the best experience on our website.

                      Learn more

                      Got it!