skip to main content
research-article

Low Overhead CS-Based Heterogeneous Framework for Big Data Acceleration

Published:06 December 2017Publication History
Skip Abstract Section

Abstract

Big data processing on hardware gained immense interest among the hardware research community to take advantage of fast processing and reconfigurability. Though the computation latency can be reduced using hardware, big data processing cost is dominated by data transfers. In this article, we propose a low overhead framework based on compressive sensing (CS) to reduce data transfers up to 67% without affecting signal quality. CS has two important kernels: “sensing” and “reconstruction.” In this article, we focus on CS reconstruction is using orthogonal matching pursuit (OMP) algorithm. We implement the OMP CS reconstruction algorithm on a domain-specific PENC many-core platform and a low-power Jetson TK1 platform consisting of an ARM CPU and a K1 GPU. Detailed performance analysis of OMP algorithm on each platform suggests that the PENC many-core platform has 15× and 18× less energy consumption and 16× and 8× faster reconstruction time as compared to the low-power ARM CPU and K1 GPU, respectively. Furthermore, we implement the proposed CS-based framework on heterogeneous architecture, in which the PENC many-core architecture is used as an “accelerator” and processing is performed on the ARM CPU platform. For demonstration, we integrate the proposed CS-based framework with a hadoop MapReduce platform for a face detection application. The results show that the proposed CS-based framework with the PENC many-core as an accelerator achieves a 26.15% data storage/transfer reduction, with an execution time and energy consumption overhead of 3.7% and 0.002%, respectively, for 5,000 image transfers. Compared to the CS-based framework implementation on the low-power Jetson TK1 ARM CPU+GPU platform, the PENC many-core implementation is 2.3× faster for the image reconstruction part, while achieving 29% higher performance and 34% better energy efficiency for the complete face detection application on the Hadoop MapReduce platform.

References

  1. 2016. Apache kernel description. Retrieved from http://www.apache.org.Google ScholarGoogle Scholar
  2. 2016. Haar feature-based cascade classifier for object detection. Retrieved from http://docs.opencv.org/.Google ScholarGoogle Scholar
  3. 2016. Jetson TK1. Retrieved from http://www.elinux.org/Jetson_TK1.Google ScholarGoogle Scholar
  4. M Andrecut. 2008. Fast GPU implementation of sparse signal recovery from random projections. Retrieved from http://www.arxiv.org/PS_cache/arxiv/pdf/0809/0809.1833v1.pdf.Google ScholarGoogle Scholar
  5. R. Baraniuk and P. Steeghs. 2007. Compressive radar imaging. In Proceedings of the IEEE 2007 Radar Conference. 128--133.Google ScholarGoogle Scholar
  6. P. Blache, H. Rabah, and A. Amira. 2012. High level prototyping and FPGA implementation of the orthogonal matching pursuit algorithm. In Proceedings of the 11th International Conference on Information Science, Signal Processing and Their Applications (ISSPA). 1336--1340.Google ScholarGoogle Scholar
  7. E. Candès and M. Wakin. 2010. An introduction to compressive sampling. IEEE Signal Processing Magazine 25, 2 (Mar 2010), 21--30.Google ScholarGoogle Scholar
  8. Y. Chen, T. Chen, Z. Xu, N. Sun, and O. Temam. 2016. DianNao family: Energy-efficient hardware accelerators for machine learning. Communications of the ACM 59, 11 (Oct. 2016), 105--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Chen and X. Zhang. 2010. High-speed architecture for image reconstruction based on compressive sensing. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP). 1574--1577.Google ScholarGoogle Scholar
  10. J. Constantin, A. Dogan, O. Andersson, P. Meinerzhagen, J. N. Rodrigues, D. Atienza, and A. Burg. 2012. TamaRISC-CS: An ultra-low-power application-specific processor for compressed sensing. In Proceedings of the IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC). 159--164.Google ScholarGoogle Scholar
  11. F. Conti and L. Benini. 2015. A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’15). EDA Consortium, San Jose, CA, 683--688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Fang, L. Chen, J. Wu, and B. Huang. 2011. GPU implementation of orthogonal matching pursuit for compressive sensing. In IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS). 1044--1047. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Gautschi, M. Schaffner, F. K. Grkaynak, and L. Benini. 2016. 4.6 A 65nm CMOS 6.4-to-29.2pJ/[email protected] shared logarithmic floating point unit for acceleration of nonlinear function kernels in a tightly coupled processor cluster. In Proceedings of the 2016 IEEE International Solid-State Circuits Conference (ISSCC). 82--83.Google ScholarGoogle Scholar
  14. R. Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Huang and L. Wang. 2012. High-speed signal reconstruction with orthogonal matching pursuit via matrix inversion bypass. In Proceedings of the IEEE Workshop on Signal Processing Systems (SiPS). 191--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Huang and L. Wang. 2014. High-speed signal reconstruction for compressive sensing applications. Journal of Signal Processing Systems 81, 3 (2014), 333--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Jafari and T. Mohsenin. 2015. A low power seizure detection processor based on direct use of compressively-sensed data and employing a deterministic random matrix. In Proceedings of the IEEE Biomedical Circuits and Systems (Biocas) Conference.Google ScholarGoogle Scholar
  18. V. Jain and E. Learned-miller. 2010. FDDB: A Benchmark for Face Detection in Unconstrained Settings. Technical Report.Google ScholarGoogle Scholar
  19. A. Korde, D. Bradley, and T. Mohsenin. 2013. Detection performance of radar compressive sensing in noisy environments. In Proceedings of the International SPIE Conference on Defense, Security, and Sensing.Google ScholarGoogle Scholar
  20. A. Kulkarni, T. Abtahi, E. Smith, and T. Mohsenin. 2016. Low energy sketching engines on many-core platform for big data acceleration. In Proceedings of the 26th Edition on Great Lakes Symposium on VLSI (GLSVLSI’16). ACM, New York, NY, 57--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Kulkarni, H. Homayoun, and T. Mohsenin. 2014. A parallel and reconfigurable architecture for efficient omp compressive sensing reconstruction. In Proceedings of the 24th Edition of the Great Lakes Symposium on VLSI (GLSVLSI’14). ACM, New York, 299--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Kulkarni, A. Jafari, C. Sagedy, and T. Mohsenin. 2016a. Sketching-based high-performance biomedical big data processing accelerator. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS). 1138--1141.Google ScholarGoogle Scholar
  23. A. Kulkarni, A. Jafari, C. Shea, and T. Mohsenin. 2016b. CS-based secured big data processing on FPGA. In Proceedings of the IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 201--201.Google ScholarGoogle Scholar
  24. A. Kulkarni and T. Mohsenin. 2015. Accelerating compressive sensing reconstruction OMP algorithm with CPU, GPU, FPGA and domain specific many-core. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS). 970--973.Google ScholarGoogle Scholar
  25. A. Kulkarni and T. Mohsenin. 2017. Low overhead architectures for OMP compressive sensing reconstruction algorithm. IEEE Transactions on Circuits and Systems I: Regular Papers 99 (2017), 1--13.Google ScholarGoogle Scholar
  26. A. Kulkarni, Y. Pino, M. French, and T. Mohsenin. 2016c. Real-time anomaly detection framework for many-core router through machine-learning techniques. Journal on Emerging Technologies in Computing (JETC) 13, 1, Article 10 (June 2016), 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Kulkarni, C. Shea, H. Homayoun, and T. Mohsenin. 2017. LESS: Big data sketching and encryption on low power platform. In Proceedings of the 2017 Design, Automation Test in Europe Conference Exhibition (DATE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Kulkarni, J. L. V. M. Stanislaus, and T. Mohsenin. 2014. Parallel heterogeneous architectures for efficient OMP compressive sensing reconstruction. Proc. SPIE 9109 (2014), 7.Google ScholarGoogle Scholar
  29. A. Kulkarni, T. Abtahi, C. Shea, A. Kulkarni, and T. Mohsenin. 2017. PACENet: Energy efficient acceleration for convolutional network on embedded platform. IEEE International Symposium on Circuits and Systems (ISCAS'17). 1--4.Google ScholarGoogle Scholar
  30. A. Kulkarni, A. Page, N. Attaran, A. Jafari, M. Malik, H. Homayoun, and T. Mohsenin. 2017. An energy-efficient programmable manycore accelerator for personalized biomedical applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems PP, 99 (2017), 1--14.Google ScholarGoogle Scholar
  31. Feng L., S. Ghosh, N. P. Johnson, and D. I. August. 2014. CGPA: Coarse-grained pipelined accelerators. In Proceedings of the 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Lienhart, A. Kuranov, and V. Pisarevsky. 2003. Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In Proceedings of the Pattern Recognition: 25th DAGM Symposium. Springer, Berlin. 297--304.Google ScholarGoogle Scholar
  33. B. Liu and B. M. Baas. 2013. Parallel AES encryption engines for many-core processor arrays. IEEE Transactions on Computers 62, 3 (March 2013), 536--547. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. X. Liu, Y. Zhu, L. Kong, C. Liu, Y. Gu, A. Vasilakos, and M. Wu. 2015. CDC: Compressive data collection for wireless sensor networks. IEEE Transactions on Parallel and Distributed Systems 26, 8 (Aug 2015), 2188--2197.Google ScholarGoogle Scholar
  35. P. Maechler, C. Studer, D. E. Bellasi, A. Maleki, A. Burg, N. Felber, H. Kaeslin, and R. G. Baraniuk. 2012. VLSI design of approximate message passing for signal restoration and compressive sensing. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2, 3 (2012), 579--590.Google ScholarGoogle ScholarCross RefCross Ref
  36. M. Malik, S. Rafatirah, A. Sasan, and H. Homayoun. 2015. System and architecture level characterization of big data applications on big and little core server architectures. In IEEE International Conference on Big Data (Big Data). 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Martinez and R. Benavente. 1998. The AR face database. In CVC Technical Report 24).Google ScholarGoogle Scholar
  38. O. Maslennikow, P. Ratuszniak, and A. Sergyienko. 2007. Implementation of Cholesky LLT-decomposition algorithm in FPGA-based rational fraction parallel processor. In Proceedings of the 14th International Conference on Mixed Design of Integrated Circuits and Systems (MIXDES’07). 287--292.Google ScholarGoogle Scholar
  39. P. Meher, B. K. Mohanty, and T. Srikanthan. 2014. Area-delay efficient architecture for MP algorithm using reconfigurable inner-product circuits. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS). 2628--2631.Google ScholarGoogle Scholar
  40. D. Needell and R. Vershynin. 2010. Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE Journal of Selected Topics in Signal Processing 4, 2 (April 2010), 310--316.Google ScholarGoogle ScholarCross RefCross Ref
  41. K. Neshatpour, M. Malik, A. Ghodrat, Mohammad, A. Sasan, and H. Homayoun. 2015. Energy-efficient acceleration of big data analytics applications using FPGAs. In Proceedings of the IEEE International Conference on Big Data. 115--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Page, N. Attaran, C. Shea, H. Homayoun, and T. Mohsenin. 2016. Low-power manycore accelerator for personalized biomedical applications. In Proceedings of the 26th Edition on Great Lakes Symposium on VLSI (GLSVLSI’16). ACM, New York, 63--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Page, A. Jafari, C. Shea, and T. Mohsenin. 2017. SPARCNet: A hardware accelerator for efficient deployment of sparse convolutional networks. Journal on Emerging Technologies in Computing (JETC), Article 10 (Jan. 2017), 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. H. Rabah, A. Amira, B. K. Mohanty, S. Almaadeed, and P. K. Meher. 2014. FPGA implementation of orthogonal matching pursuit for compressive sensing reconstruction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 99 (2014), 1--1.Google ScholarGoogle Scholar
  45. B. Rouhani, E. Songhori, A. Mirhoseini, and F. Koushanfar. 2015. SSketch: An automated framework for streaming sketch-based analysis of big data on FPGA. In Proceedings of the IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. Septimus and R. Steinberg. 2010. Compressive sampling hardware reconstruction. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS). 3316--3319.Google ScholarGoogle Scholar
  47. P. Sermwuthisarn, S. Auethavekiat, and V. Patanavijit. 2009. A fast image recovery using compressive sensing technique with block based orthogonal matching pursuit. In International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2009). 212--215.Google ScholarGoogle ScholarCross RefCross Ref
  48. Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang. 2010. FPMR: MapReduce framework on FPGA. In Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’10). ACM, New York, 93--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. P. Sinha, B. Balas, Y. Ostrovsky, and R. Russell. 2006. Face recognition by humans: Nineteen results all computer vision researchers should know about. Proceedings of the IEEE 94, 11 (Nov 2006), 1948--1962.Google ScholarGoogle ScholarCross RefCross Ref
  50. A. Stillmaker, L. Stillmaker, and B. Baas. 2012. Fine-grained energy-efficient sorting on a many-core processor array. In Proceedings of the IEEE 18th Internatonal Confereonce on Parallel and Distributed Systems (ICPADS). 652--659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. K. Stokke, H. Stensland, C. Griwodz, and P. Halvorsen. 2015. Energy efficient video encoding using the tegra K1 mobile processor. In Proceedings of the 6th ACM Multimedia Systems Conference (MMSys’15). ACM, New York, 81--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. P. B. Swamy, S. K. Ambat, S. Chatterjee, and K. V. S. Hari. 2014. Reduced look ahead orthogonal matching pursuit. In 20th National Conference on Communications (NCC). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  53. M. Tavana, D. Pathak, M. Hajkazemi, M. Malik, I. Savidis, and H. Homayoun. 2015. Realizing complexity-effective on-chip power delivery for many-core platforms by exploiting optimized mapping. In Proceedings of the IEEE 33rd International Conference on Computer Design (ICCD). 581--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J. Tropp and A. Gilbert. 2007. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory 53, 12 (Dec. 2007), 4655--4666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. D. Truong, W. Cheng, T. Mohsenin, Y. Zhiyi, A. Jacobson, G. Landge, M. Meeuwsen, C. Watnik, A. Tran, X. Zhibin, E. Work, J. Webb, P. Mejia, and B. Baas. 2009. A 167-processor computational platform in 65 nm CMOS. IEEE Journal of Solid-State Circuits 44, 4 (Apr. 2009), 1130--1144.Google ScholarGoogle ScholarCross RefCross Ref
  56. P. Viola and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), Vol. 1. I--511--I--518.Google ScholarGoogle Scholar
  57. C. Wang, X. Li, and X. Zhou. 2015. SODA: Software-defined FPGA-based accelerators for big data. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’15). EDA Consortium, San Jose, CA, 884--887. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Y. Yan, J. Zhang, B. Huang, X. Sun, J. Mu, Z. Zhang, and T. Moscibroda. 2015. Distributed outlier detection using compressive sensing. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD’15). ACM, New York, 3--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. J. Zhang, Y. Yan, L. J. Chen, M. Wang, T. Moscibroda, and Z. Zhang. 2014. Impression store: Compressive sensing-based storage for big data analytics. In Proceedings of the 6th USENIX Conference on Hot Topics in Cloud Computing (HotCloud’14). USENIX Association, Berkeley, CA, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!