Abstract
Branch and bound (B8B) algorithms structure the search space as a tree and eliminate infeasible solutions early by pruning subtrees that cannot lead to a valid or optimal solution. Custom hardware designs significantly accelerate the execution of these algorithms. In this article, we demonstrate a high-performance B8B implementation on FPGAs. First, we identify general elements of B8B algorithms and describe their implementation as a finite state machine. Then, we introduce workers that autonomously cooperate using work stealing to allow parallel execution and full utilization of the target FPGA. Finally, we explore advantages of instance-specific designs that target a specific problem instance to improve performance.
We evaluate our concepts by applying them to a branch and bound problem, the reconstruction of corrupted AES keys obtained from cold-boot attacks. The evaluation shows that our work stealing approach is scalable with the available resources and provides speedups proportional to the number of workers. Instance-specific designs allow us to achieve an overall speedup of 47 × compared to the fastest implementation of AES key reconstruction so far. Finally, we demonstrate how instance-specific designs can be generated just-in-time such that the provided speedups outweigh the additional time required for design synthesis.
- Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exper. 23, 2 (2011), 187--198. Google Scholar
Digital Library
- A. Tarun Beri, B. Sorav Bansal, and C. Subodh Kumar. 2015. Locality aware work-stealing based scheduling in hybrid CPU-GPU clusters. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’15). 48.Google Scholar
- Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5, 720--748. Google Scholar
Digital Library
- Joan Daemen and Vincent Rijmen. 2000. The block cipher rijndael. In Proceedings of the International Conference on Smart Card Research and Applications (CARDIS’00). Springer, 277--284. Google Scholar
Digital Library
- John D. Davis, Zhangxi Tan, Fang Yu, and Lintao Zhang. 2008. A practical reconfigurable hardware accelerator for boolean satisfiability solvers. In Proceedings of the Design Automation Conference (DAC’08). ACM, 780--785. Google Scholar
Digital Library
- Vinoth Krishnan Elangovan, Rosa.M. Badia, and Eduard Ayguad. 2014. Scalability and parallel execution of OmpSs-OpenCL tasks on heterogeneous CPU-GPU environment. In Proceedings of the International Conference on Supercomputing (ISC’14). 141--155. Google Scholar
Digital Library
- Y. Guo, J. Zhao, V. Cave, and V. Sarkar. 2010. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In Proceedings of the International Symposium on Parallel and Distributed Processing (IPDPS’10). 1--12.Google Scholar
- J. Alex Halderman, Seth D. Schoen, Nadia Heninger, William Clarkson, William Paul, Joseph A. Calandrino, Ariel J. Feldman, Jacob Appelbaum, and Edward W. Felten. 2009. Lest we remember: Cold-boot attacks on encryption keys. Commun. ACM 52, 5, 91--98. Google Scholar
Digital Library
- Youssef Hamadi and David Merceron. 1997. Reconfigurable architectures: A new vision for optimization problems. In Principles and Practice of Constraint Programming—CP97. Springer, 209--221. Google Scholar
Digital Library
- Brian Kaplan and Matthew Geiger. 2007. RAM Is Key: Extracting Disk Encryption Keys from Volatile Memory. Master’s thesis. Carnegie Mellon University.Google Scholar
- Vladimir Kašík. 2010. Acceleration of backtracking algorithm with FPGA. In Proceedings of the International Conference on Applied Electronics (AE’10). IEEE, 1--4.Google Scholar
- Srinidhi Kestur, John D. Davis, and Eric S. Chung. 2012. Towards a universal FPGA matrix-vector multiplication architecture. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). IEEE Computer Society, 9--16. Google Scholar
Digital Library
- V. Kumar, A. Sbrlea, A. Jayaraj, Z. Budimli, D. Majeti, and V. Sarkar. 2015. Heterogeneous work-stealing across CPU and DSP cores. In Proceedings of the High Performance Extreme Computing Conference (HPEC’15). 1--6.Google Scholar
- A. H. Land and A. G. Doig. 1960. An automatic method of solving discrete programming problems. Econometrica 28, 3 (1960), 497--520.Google Scholar
Cross Ref
- J. V. F. Lima, T. Gautier, N. Maillard, and V. Danjean. 2012. Exploiting concurrent GPU operations for efficient work stealing on multi-GPUs. In Proceedings of the IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’12). 75--82. Google Scholar
Digital Library
- João V. F. Lima, Thierry Gautier, Vincent Danjean, Bruno Raffin, and Nicolas Maillard. 2015. Design and analysis of scheduling strategies for multi-CPU and multi-GPU architectures. Parallel Comput. 44 (2015), 37--52. Google Scholar
Digital Library
- Pavlos Malakonakis and Apostolos Dollas. 2011. Exploitation of parallel search space evaluation with fpgas in combinatorial problems: The eternity II case. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’11). IEEE, 264--268. Google Scholar
Digital Library
- N. Melab, R. Leroy, M. Mezmaz, and D. Tuyttens. 2015. Parallel branch-and-bound using private IVM-based work stealing on xeon phi MIC coprocessor. In Proceedings of the International Conference on High Performance Computing 8 Simulation (HPCS’15). 394--399.Google Scholar
- Seung-Jai Min, Costin Iancu, and Katherine Yelick. 2011. Hierarchical work stealing on manycore clusters. In Proceedings of the Conference on Partitioned Global Address Space Programming Models (PGAS’11).Google Scholar
- Angeles Navarro, Antonio Vilches, Francisco Corbera, and Rafael Asenjo. 2014. Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures. J. Supercomput. 70, 2 (2014), 756--771. Google Scholar
Digital Library
- Oliver Pell and Vitali Averbukh. 2012. Maximum performance computing with dataflow engines. IEEE Comput. Sci. Eng. 14 (2012), 98--103. Google Scholar
Digital Library
- Christian Plessl and Marco Platzner. 2003. Instance-specific accelerators for minimum covering. J. Supercomput. 26, 2 (2003), 109--129. Google Scholar
Digital Library
- Nadesh Ramanathan, John Wickerson, Felix Winterstein, and George A. Constantinides. 2016. A case for work-stealing on FPGAs with OpenCL atomics. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’16). ACM, New York, 48--53. Google Scholar
Digital Library
- Heinrich Riebler, Tobias Kenter, Christian Plessl, and Christoph Sorge. 2014. Reconstructing AES key schedules from decayed memory with FPGAs. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’14). IEEE. Google Scholar
Digital Library
- Heinrich Riebler, Tobias Kenter, Christoph Sorge, and Christian Plessl. 2013. FPGA-accelerated key search for cold-boot attacks against AES. In Proceedings of the International Conference on Field Programmable Technology (ICFPT’13). IEEE.Google Scholar
Cross Ref
- Iouliia Skliarova and António B. Ferrari. 2004. A software/reconfigurable hardware SAT solver. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 12, 4 (2004), 408--419. Google Scholar
Digital Library
- Alex Tsow. 2009. An improved recovery algorithm for decayed AES key schedule images. In Proceedings of the International Workshop on Selected Areas in Cryptography (SAC’09). 215--230.Google Scholar
Digital Library
- Qichao Wang. 2012. Localization and Extraction of Cryptographic Keys from Memory Images and Data Streams. Master’s thesis. University of Paderborn.Google Scholar
- Makoto Yokoo, Takayuki Suyama, and Hiroshi Sawada. 1996. Solving satisfiability problems using field programmable gate arrays: First results. In Principles and Practice of Constraint Programming—CP96. Springer, 497--509. Google Scholar
Digital Library
- Peixin Zhong, Margaret Martonosi, Sharad Malik, and Pranav Ashar. 1997. Implementing boolean satisfiability in configurable hardware. In Proceedings of the Logic Synthesis Workshop. Citeseer.Google Scholar
Index Terms
Efficient Branch and Bound on FPGAs Using Work Stealing and Instance-Specific Designs
Recommendations
Hardware-software co-design of AES on FPGA
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and InformaticsThis paper presents a compact hardware-software co-design of Advanced Encryption Standard (AES) on the field programmable gate arrays (FPGA) designed for low-cost embedded systems. The design uses MicroBlaze, a soft-core processor from Xilinx. The ...
Putting together what fits together: grÆstl
CARDIS'12: Proceedings of the 11th international conference on Smart Card Research and Advanced ApplicationsWe present GrÆStl, a combined hardware architecture for the Advanced Encryption Standard (AES) and Grøstl, one of the final round candidates of the SHA-3 hash competition. GrÆStl has been designed for low-resource devices implementing AES-128 (...
Compact and Efficient Encryption/Decryption Module for FPGA Implementation of the AES Rijndael Very Well Suited for Small Embedded Applications
ITCC '04: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2Hardware implementations of the Advanced EncryptionStandard (AES) Rijndael algorithm have recently been theobject of an intensive evaluation. Several papers describeefficient architectures for ASICs and FPGAs. In this context,the highest effort was ...






Comments