Abstract
Quasi-Monte Carlo simulation is a special Monte Carlo simulation method that uses quasi-random or low-discrepancy numbers as random sample sets. In many applications, this method has proved advantageous compared to the traditional Monte Carlo simulation method, which uses pseudo-random numbers, thanks to its faster convergence and higher level of accuracy. This article presents the design and implementation of a massively parallelized Quasi-Monte Carlo simulation engine on an FPGA-based supercomputer, called Maxwell. It also compares this implementation with equivalent graphics processing units (GPUs) and general purpose processors (GPP)-based implementations. The detailed comparison between these three implementations (FPGA vs. GPP vs. GPU) is done in the context of financial derivatives pricing based on our Quasi-Monte Carlo simulation engine. Real hardware implementations on the Maxwell machine show that FPGAs outperform equivalent GPP-based software implementations by 2 orders of magnitude, with the speed-up figure scaling linearly with the number of processing nodes used (FPGAs/GPPs). The same implementations show that FPGAs achieve a ~ 3x speedup compared to equivalent GPU-based implementations. Power consumption measurements also show FPGAs to be 336x more energy efficient than CPUs, and 16x more energy efficient than GPUs.
- Baxter, R., Booth, S., Bull, M., Cawood, G., Perry, J., Parsons, M., Simpson, A., Trew, A., McCormick, A., Smart, G., Smart, R., Cantle, A., Chamberlain, R., and Genest, G. 2007. Maxwell - A 64 FPGA supercomputer. In Proceedings of the 2nd NASA/ESA Conference on Adaptive Hardware and Systems (AHS’07). 287--294. Google Scholar
Digital Library
- Cheung, R. C. C., Lee, D.-U., Luk, W., and Villasenor, J. D. 2007. Hardware generation of arbitrary random number distributions from uniform distributions via the inversion method. IEEE Trans. on VLSI Syst. 952--962. Google Scholar
Digital Library
- Dalal, I. L., Stefan, D., and Harwayne-Gidansky, J. 2008. Low discrepancy sequences for Monte Carlo simulations on reconfigurable platforms. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors. 108--113. Google Scholar
Digital Library
- Entacher, K., Schell, T., Schmid, W. C., and Uhl, A. 2003. Defects in parallel Monte Carlo and quasi-Monte Carlo integration using the leap-frog technique. Int. J. Paral. Emer. Distrib. Syst. 18, 13--26.Google Scholar
- Hull, J. C. 2000. Option, Futures, and other Derivatives. Prentice Hall.Google Scholar
- Intel. 2007a. Intel® Math Kernel Library Reference Manual. Intel Corporation.Google Scholar
- Intel. 2007b. Intel® Math Kernel Library Vector Statistical Library Notes. Intel Corporation.Google Scholar
- Jäckel, P. 2002. Monte Carlo Methods in Finance. John Wiley & Sons, Ltd.Google Scholar
- Joy, C., Boyle, P. P., and Tan, K. S. 1996. Quasi-Monte Carlo methods in numerical finance. Manage. Sci. 42, 926--938. Google Scholar
Digital Library
- Mencer, O., Boullis, N., Luk, W., and Styles, H. 2001. Parameterized function evaluation for FPGAs. In Proceedings of the 11th International Conference on Field-Programmable Logic and Applications. 544--554. Google Scholar
Digital Library
- Morris, G. W. and Aubury, M. 2007. Design space exploration of the European option benchmark using hyperstreams. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’07). 5--10.Google Scholar
- Niederreiter, H. 1992. Random Number Generation and Quasi-Monte Carlo Methods. Society for Industrial and Applied Mathematics. Google Scholar
Digital Library
- Nvidia. 2008. NVIDIA CUDA Compute Unified Device Architecture Programming Guide. NVIDIA Corporation.Google Scholar
- Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press. Google Scholar
Digital Library
- Snir, M. and Otto, S. 1998. MPI-The Complete Reference. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- SUN. 2002. Sun ONE Grid Engine Administration and User’s Guide. Sun Microsystems, Inc.Google Scholar
- Thomas, D. B., Bower, J. A., and Luk, W. 2006. Hardware architectures for Monte-Carlo based financial simulations. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT’06). 377--380.Google Scholar
- Tian, X. and Benkrid, K. 2008. Design and implementation of a high performance financial Monte-Carlo simulation engine on an FPGA supercomputer. In Proceedings of the International Conference on Field-Programmable Technology (FPT’08). 81--88.Google Scholar
- Tian, X., Benkrid, K., and Gu, X. 2008. High performance Monte-Carlo based option pricing on FPGAs. Engi. Lett. 16, 434--442.Google Scholar
- Woods, N. A. and VanCourt, T. 2008. FPGA acceleration of Quasi-Monte Carlo in finance. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’08). 335--340.Google Scholar
Cross Ref
- Yeung, J. H. C., Tsang, C. C., Tsoi, K. H., Kwan, B. S. H., Cheung, C. C. C., Chan, A. P. C., and Leong, P. H. W. 2008. Map-reduce as a programming model for custom computing machines. In Proceedings of the 16th International Symposium on Field-Programmable Custom Computing Machines (FCCM’08). 149--159. Google Scholar
Digital Library
- Zenios, S. A. 1999. High-performance computing in finance: The last 10 years and the next. Paral. Comput. 25, 2149--2175. Google Scholar
Digital Library
- Zhang, G. L., Leong, P. H. W., Ho, C. H., Tsoi, K. H., Cheung, C. C. C., Lee, D.-U., Cheung, R. C. C., and Luk, W. 2005. Reconfigurable acceleration for Monte Carlo based financial simulation. In Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT’05). 215--222.Google Scholar
Index Terms
High-Performance Quasi-Monte Carlo Financial Simulation: FPGA vs. GPP vs. GPU
Recommendations
Heterogeneous concurrent execution of Monte Carlo photon transport on CPU, GPU and MIC
IA3 '14: Proceedings of the 4th Workshop on Irregular Applications: Architectures and AlgorithmsIn this paper, a new level of heterogeneous concurrent execution of Monte Carlo photon transport is presented. ARCHER, an application for computing radiation dosimetry for CT imaging involving whole-body patient phantoms has been extended to execute on ...
A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arraysSparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-performance scientific and engineering applications. In recent years, tuned software libraries for multi-core microprocessors (CPUs) and graphics processing ...
On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms
ICPADS '13: Proceedings of the 2013 International Conference on Parallel and Distributed SystemsThe proliferation of heterogeneous computing systems presents the parallel computing community with the challenge of porting legacy and emerging applications to multiple processors with diverse programming abstractions. OpenCL is a vendor-agnostic and ...






Comments