skip to main content
research-article

Acceleration of k-Means Algorithm Using Altera SDK for OpenCL

Published:24 September 2016Publication History
Skip Abstract Section

Abstract

A K-means clustering algorithm involves partitioning of data iteratively into k clusters. It is one of the most popular data-mining algorithms [Wu et al. 2007], and is widely used in other applications, such as image processing and machine learning. However, k-means is highly time-consuming when data or cluster size is large. Traditionally, FPGAs have shown great promise for accelerating computationally intensive algorithms, but they are harder to use for acceleration if we rely on traditional HD-based design methods. The recent introduction of Altera SDK for the OpenCL high-level synthesis tool allows developers to utilize FPGA's potential without long development periods and extensive hardware knowledge. This article presents an optimized implementation of a k-means clustering algorithm on an FPGA using Altera SDK for OpenCL. Performance and power consumption is measured with various data, cluster, and dimension sizes. When compared to state-of-the-art solutions, this implementation supports larger cluster sizes, offers up to 21x speed over a CPU and is more power efficient than a GPU. Unlike previous implementations, it can deliver consistently high throughput across large or small feature dimensions given reasonable cluster sizes and large enough data size.

Skip Supplemental Material Section

Supplemental Material

References

  1. Altera Corporation. 2015a. Altera SDK for OpenCL Overview. Retrieved August 3, 2016 from https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html.Google ScholarGoogle Scholar
  2. Altera Corporation. 2015b. Altera SDK for OpenCL Programming Guide, version 15.0.0. Retrieved August 3, 2016 from http://www.altera.com/literature/hb/opencl-sdk/aocl_programming_guide.pdf.Google ScholarGoogle Scholar
  3. David Arthur and Sergei Vassilvitskii. 2007. k-means++: the advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’07). Society for Industrial and Applied Mathematics, Philadelphia, PA, 1027--1035. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Bahmani, B. Mosele, A. Vattani, R. Kumar, and S. Vassilvitskii. 2012. Scalable k-means++. In Proceedings of the VLDB Endowment. 5, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, and K. Skadron. 2008. A performance study of general-purpose applications on graphics processors using CUDA. In Journal of Parallel and Distributed Computing 68, 10, 1370--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Choi and H. So. 2014. Map-reduce processing of k-means algorithm with FPGA-accelerated computer cluster. In IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.Google ScholarGoogle Scholar
  7. B. Dhanasekaran and N. Rubin. 2011. A new method for GPU based irregular reductions and its application to k-means clustering. In Proceedings of the 4th Workshop on General Purpose Processing on Graphics Processing Units (GPGPU-4’11). ACM, New York, NY, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. Fang, K. K. Lau, M. Lu, X. Xiao, C. K. Lam, P. Y. Yang, B. Hel, Q. Luo, P. V. Sander, and K. Yang. 2008. Parallel data mining on graphics processors. Technical report, Hong Kong University of Science and Technology.Google ScholarGoogle Scholar
  9. T. Gunarathne, B. Salpitikorala, G. Fox, and A. Chauhan. 2011. Optimizing OpenCL kernels for iterative statistical applications on GPUs. In Proceedings of the 2nd International Workshop on GPUs and Scientific Applications (GPUScA'11).Google ScholarGoogle Scholar
  10. Khronos OpenCL Working Group. 2009. The OpenCL Specification Version 1.0. Retrieved August 3, 2016 from http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf.Google ScholarGoogle Scholar
  11. Y. Li, K. Zhao, X. Chu, and J. Liu. 2010. Speeding up k-means algorithm by GPUs. In 10th IEEE International Conference on Computer and Information Technology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Liao. 2013. Parallel K-Means Data Clustering. Retrieved August 3, 2016 from http://users.eecs.northwestern.edu/∼wkliao/Kmeans/.Google ScholarGoogle Scholar
  13. R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. 2006. MineBench: A benchmark suite for data mining workloads. In 2006 IEEE International Symposium on Workload Characterization, 182--188, 25--27. Google ScholarGoogle ScholarCross RefCross Ref
  14. O. Segal, M. Margala, S. R. Chalamalasetti, and M. Wright. 2014. High level programming for heterogeneous architectures. In 1st International Workshop on FPGAs for Software Programmers (FSP’14).Google ScholarGoogle Scholar
  15. Terasic Technologies Inc. 2014. DE5-Net FPGA Development Kit Specification. Retrieved August 3, 2016 from de5-net.terasic.com/.Google ScholarGoogle Scholar
  16. ThinkTank Energy Products Inc. 2015. Watt's Up Pro Power meter specifications. Retrieved August 3, 2016 from https://www.wattsupmeters.com/secure/products.php?pn = 0&wai=276&spec=4Google ScholarGoogle Scholar
  17. R. Wu, B. Zhang, and M. Hsu. 2009. Clustering billions of data points using GPUs. In Proceedings of the Combined Workshops on UnConventional High Performance Computing Workshop Plus Memory Access Workshop (UCHPC-MAW’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Wu, V. Kumar, J. Quinlan, J. Ghosh, Q. Yang, H. Motoda, A. McLachlan, A. Ng, B. Liu, Z. Zhou, M. Steinbach, D. Hand, and D. Steinberg. 2007. Top 10 algorithms in data mining. Knowledge and Information Systems, 14, 1, 1--37. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Acceleration of k-Means Algorithm Using Altera SDK for OpenCL

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 1
          March 2017
          206 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3002131
          • Editor:
          • Steve Wilton
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 September 2016
          • Accepted: 1 June 2016
          • Revised: 1 May 2016
          • Received: 1 February 2016
          Published in trets Volume 10, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!