Abstract
A K-means clustering algorithm involves partitioning of data iteratively into k clusters. It is one of the most popular data-mining algorithms [Wu et al. 2007], and is widely used in other applications, such as image processing and machine learning. However, k-means is highly time-consuming when data or cluster size is large. Traditionally, FPGAs have shown great promise for accelerating computationally intensive algorithms, but they are harder to use for acceleration if we rely on traditional HD-based design methods. The recent introduction of Altera SDK for the OpenCL high-level synthesis tool allows developers to utilize FPGA's potential without long development periods and extensive hardware knowledge. This article presents an optimized implementation of a k-means clustering algorithm on an FPGA using Altera SDK for OpenCL. Performance and power consumption is measured with various data, cluster, and dimension sizes. When compared to state-of-the-art solutions, this implementation supports larger cluster sizes, offers up to 21x speed over a CPU and is more power efficient than a GPU. Unlike previous implementations, it can deliver consistently high throughput across large or small feature dimensions given reasonable cluster sizes and large enough data size.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Acceleration of k-Means Algorithm Using Altera SDK for OpenCL
- Altera Corporation. 2015a. Altera SDK for OpenCL Overview. Retrieved August 3, 2016 from https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html.Google Scholar
- Altera Corporation. 2015b. Altera SDK for OpenCL Programming Guide, version 15.0.0. Retrieved August 3, 2016 from http://www.altera.com/literature/hb/opencl-sdk/aocl_programming_guide.pdf.Google Scholar
- David Arthur and Sergei Vassilvitskii. 2007. k-means++: the advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’07). Society for Industrial and Applied Mathematics, Philadelphia, PA, 1027--1035. Google Scholar
Digital Library
- B. Bahmani, B. Mosele, A. Vattani, R. Kumar, and S. Vassilvitskii. 2012. Scalable k-means++. In Proceedings of the VLDB Endowment. 5, 7. Google Scholar
Digital Library
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, and K. Skadron. 2008. A performance study of general-purpose applications on graphics processors using CUDA. In Journal of Parallel and Distributed Computing 68, 10, 1370--1380. Google Scholar
Digital Library
- Y. Choi and H. So. 2014. Map-reduce processing of k-means algorithm with FPGA-accelerated computer cluster. In IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.Google Scholar
- B. Dhanasekaran and N. Rubin. 2011. A new method for GPU based irregular reductions and its application to k-means clustering. In Proceedings of the 4th Workshop on General Purpose Processing on Graphics Processing Units (GPGPU-4’11). ACM, New York, NY, 2011. Google Scholar
Digital Library
- W. Fang, K. K. Lau, M. Lu, X. Xiao, C. K. Lam, P. Y. Yang, B. Hel, Q. Luo, P. V. Sander, and K. Yang. 2008. Parallel data mining on graphics processors. Technical report, Hong Kong University of Science and Technology.Google Scholar
- T. Gunarathne, B. Salpitikorala, G. Fox, and A. Chauhan. 2011. Optimizing OpenCL kernels for iterative statistical applications on GPUs. In Proceedings of the 2nd International Workshop on GPUs and Scientific Applications (GPUScA'11).Google Scholar
- Khronos OpenCL Working Group. 2009. The OpenCL Specification Version 1.0. Retrieved August 3, 2016 from http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf.Google Scholar
- Y. Li, K. Zhao, X. Chu, and J. Liu. 2010. Speeding up k-means algorithm by GPUs. In 10th IEEE International Conference on Computer and Information Technology. Google Scholar
Digital Library
- W. Liao. 2013. Parallel K-Means Data Clustering. Retrieved August 3, 2016 from http://users.eecs.northwestern.edu/∼wkliao/Kmeans/.Google Scholar
- R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. 2006. MineBench: A benchmark suite for data mining workloads. In 2006 IEEE International Symposium on Workload Characterization, 182--188, 25--27. Google Scholar
Cross Ref
- O. Segal, M. Margala, S. R. Chalamalasetti, and M. Wright. 2014. High level programming for heterogeneous architectures. In 1st International Workshop on FPGAs for Software Programmers (FSP’14).Google Scholar
- Terasic Technologies Inc. 2014. DE5-Net FPGA Development Kit Specification. Retrieved August 3, 2016 from de5-net.terasic.com/.Google Scholar
- ThinkTank Energy Products Inc. 2015. Watt's Up Pro Power meter specifications. Retrieved August 3, 2016 from https://www.wattsupmeters.com/secure/products.php?pn = 0&wai=276&spec=4Google Scholar
- R. Wu, B. Zhang, and M. Hsu. 2009. Clustering billions of data points using GPUs. In Proceedings of the Combined Workshops on UnConventional High Performance Computing Workshop Plus Memory Access Workshop (UCHPC-MAW’09). Google Scholar
Digital Library
- X. Wu, V. Kumar, J. Quinlan, J. Ghosh, Q. Yang, H. Motoda, A. McLachlan, A. Ng, B. Liu, Z. Zhou, M. Steinbach, D. Hand, and D. Steinberg. 2007. Top 10 algorithms in data mining. Knowledge and Information Systems, 14, 1, 1--37. Google Scholar
Digital Library
Index Terms
Acceleration of k-Means Algorithm Using Altera SDK for OpenCL
Recommendations
Nuclear Reactor Simulations on OpenCL FPGA Platform
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysField-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The maturing high-level synthesis (HLS) ...
Efficient OpenCL system integration of non-blocking FPGA accelerators
AbstractOpenCL functions as a portability layer for diverse heterogeneous hardware platforms including CPUs, GPUs, FPGAs, and hardware accelerators. However, OpenCL programs utilizing multiple of these devices in the same computing platform ...
Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench
IWOCL '18: Proceedings of the International Workshop on OpenCLField-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The emerging high-level synthesis tools ...






Comments