Abstract
We propose SSketch, a novel automated framework for efficient analysis of dynamic big data with dense (non-sparse) correlation matrices on reconfigurable platforms. SSketch targets streaming applications where each data sample can be processed only once and storage is severely limited. Our framework adaptively learns from the stream of input data and updates a corresponding ensemble of lower-dimensional data structures, a.k.a., a sketch matrix. A new sketching methodology is introduced that tailors the problem of transforming the big data with dense correlations to an ensemble of lower-dimensional subspaces such that it is suitable for hardware-based acceleration performed by reconfigurable hardware. The new method is scalable, while it significantly reduces costly memory interactions and enhances matrix computation performance by leveraging coarse-grained parallelism existing in the dataset. SSketch provides an automated optimization methodology for creating the most accurate data sketch for a given set of user-defined constraints, including runtime and power as well as platform constraints such as memory. To facilitate automation, SSketch takes advantage of a Hardware/Software (HW/SW) co-design approach: It provides an Application Programming Interface that can be customized for rapid prototyping of an arbitrary matrix-based data analysis algorithm. Proof-of-concept evaluations on a variety of visual datasets with more than 11 million non-zeros demonstrate up to a 200-fold speedup on our hardware-accelerated realization of SSketch compared to a software-based deployment on a general-purpose processor.
- Mircea Andrecut. 2008. Fast GPU implementation of sparse signal recovery from random projections. arXiv preprint arXiv:0809.1833.Google Scholar
- Lin Bai, Patrick Maechler, Michael Muehlberghuber, and Hubert Kaeslin. 2012. High-speed compressed sensing reconstruction on FPGA using OMP and AMP. In Proceedings of the 2012 19th IEEE International Conference on Electronics, Circuits and Systems (ICECS). IEEE, 53--56.Google Scholar
Cross Ref
- Jeffrey D. Blanchard and Jared Tanner. 2013. GPU accelerated greedy algorithms for compressed sensing. Math. Program. Comput. 5, 3 (2013), 267--304.Google Scholar
Cross Ref
- Kenneth L. Clarkson and David P. Woodruff. 2009. Numerical linear algebra in the streaming model. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 205--214. Google Scholar
Digital Library
- Jason Cong, Muhuan Huang, and Peng Zhang. 2014. Combining computation and communication optimizations in system synthesis for streaming applications. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays. ACM, 213--222. Google Scholar
Digital Library
- N. Council. 2013. Frontiers in massive data analysis. (2013).Google Scholar
- Xilinx Datasheet. 2014. Xilinx Virtex 6 Datasheet. Retrieved 2014 from http://www.xilinx.com/publications/prod_mktg/Virtex6_Product_Table.pdf.Google Scholar
- Petros Drineas and Michael W. Mahoney. 2005. On the Nyström method for approximating a gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6 (2005), 2153--2175. Google Scholar
Digital Library
- Eva L. Dyer, Aswin C. Sankaranarayanan, and Richard G. Baraniuk. 2013. Greedy feature selection for subspace clustering. J. Mach. Learn. Res. 14, 1 (2013), 2487--2517. Google Scholar
Digital Library
- Yong Fang, Liang Chen, Jiaji Wu, and Bormin Huang. 2011. GPU implementation of orthogonal matching pursuit for compressive sensing. In Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 1044--1047. Google Scholar
Digital Library
- Gene H. Golub and Christian Reinsch. 1970. Singular value decomposition and least squares solutions. Numer. Math. 14, 5 (1970), 403--420. Google Scholar
Digital Library
- Pierre Greisen, Marian Runo, Patrice Guillet, Simon Heinzle, Aljoscha Smolic, Hubert Kaeslin, and Markus Gross. 2013. Evaluation and FPGA implementation of sparse linear solvers for video processing applications. IEEE Trans. Circ. Syst. Vid. Technol. 23, 8 (2013), 1402--1407. Google Scholar
Digital Library
- A. Kulkarni, T. Abtahi, E. Smith, and T. Mohsenin. 2016. Low energy sketching engines on many-core platform for big data acceleration. In Proceedings of the 26th Edition on Great Lakes Symposium on VLSI (GLSVLSI’16). ACM, New York, NY, 57--62. Google Scholar
Digital Library
- A. Kulkarni, A. Jafari, C. Sagedy, and T. Mohsenin. 2016a. Sketching-based high-performance biomedical big data processing accelerator. In Proceedings of the 2016 IEEE International Symposium on Circuits and Systems (ISCAS). 1138--1141.Google Scholar
- A. Kulkarni, A. Jafari, C. Shea, and T. Mohsenin. 2016b. CS-based secured big data processing on FPGA. In Proceedings of the 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 201--201.Google Scholar
- Amey M. Kulkarni, Houman Homayoun, and Tinoosh Mohsenin. 2014. A parallel and reconfigurable architecture for efficient OMP compressive sensing reconstruction. In Proceedings of the 24th Edition of the Great Lakes Symposium on VLSI. ACM, 299--304. Google Scholar
Digital Library
- Luis M. Ledesma-Carrillo, Eduardo Cabal-Yepez, Rene de J. Romero-Troncoso, Arturo Garcia-Perez, Roque Osornio-Rios, Tobia D. Carozzi, and others. 2011. Reconfigurable FPGA-Based unit for singular value decomposition of large mxn matrices. In Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig). IEEE, 345--350. Google Scholar
Digital Library
- Edo Liberty. 2013. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 581--588. Google Scholar
Digital Library
- Stanford Dataset Archive LightField. 2014. Retrieved from http://lightfield.stanford.edu/.Google Scholar
- Patrick Maechler, Pierre Greisen, Norbert Felber, and Andreas Burg. 2010. Matching pursuit: Evaluation and implementatio for LTE channel estimation. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 589--592.Google Scholar
Cross Ref
- Gunnar Martinsson, Adrianna Gillman, Edo Liberty, Nathan Halko, Vladimir Rokhlin, Sijia Hao, Yoel Shkolnisky, Patrick Young, Joel Tropp, Mark Tygert, and others. 2010. Randomized methods for computing the singular value decomposition (SVD) of very large matrices. In Proceedings of the Workshop on Algorithms for Modern Massive Data Sets, Palo Alto.Google Scholar
- Kshitij Marwah, Gordon Wetzstein, Yosuke Bando, and Ramesh Raskar. 2013. Compressive light field photography using overcomplete dictionaries and optimized projections. ACM Trans. Graph. 32, 4 (2013), 46. Google Scholar
Digital Library
- Azalia Mirhoseini, Eva Dyer, Ebrahim Songhori, Richard Baraniuk, Farinaz Koushanfar, and others. 2015. RankMap: A platform-aware framework for distributed learning from dense datasets. arXiv preprint arXiv:1503.08169 (2015).Google Scholar
- Azalia Mirhoseini, Bita Darvish Rouhani, Ebrahim M. Songhori, and Farinaz Koushanfar. 2016. Perform-ML: Performance optimized machine learning by platform and content aware customization. In Proceedings of the 53rd Annual Design Automation Conference. ACM, 20. Google Scholar
Digital Library
- Douglas C. Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining. 2012. Introduction to Linear Regression Analysis, Vol. 821. John Wiley 8 Sons.Google Scholar
- Dimitris S. Papailiopoulos, Alexandros G. Dimakis, and Stavros Korokythakis. 2013. Sparse pca through low-rank approximations. arXiv preprint arXiv:1303.0551 (2013).Google Scholar
- Franjo Plavec, Zvonko Vranesic, and Stephen Brown. 2013. Exploiting task-and data-level parallelism in streaming applications implemented in FPGAs. ACM Trans. Reconf. Technol. Syst. 6, 4 (2013), 16. Google Scholar
Digital Library
- Antonio Plaza, Javier Plaza, Alexander Paz, and Sergio Sanchez. 2011. Parallel hyperspectral image and signal processing {applications corner}. Sign. Process. Mag. 28, 3 (2011), 119--126.Google Scholar
Cross Ref
- Sanguthevar Rajasekaran and Mingjun Song. 2006. A novel scheme for the parallel computation of SVDs. In High Performance Computing and Communications. Springer, 129--137. Google Scholar
Digital Library
- Fengbo Ren, Richard Dorrace, Wenyao Xu, and Dejan Markovic. 2013. A single-precision compressive sensing signal reconstruction engine on FPGAs. In Proceedings of the 2013 23rd International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1--4.Google Scholar
Cross Ref
- Bita Darvish Rouhani, Ebrahim Songhori, Azalia Mirhoseini, and Farinaz Koushanfar. 2015. SSketch: An automated framework for streaming sketch-based analysis of big data on FPGA. In Proceedings of the 23rd IEEE International Symposium on Field-Programmable Custom Computing Machines Conference (FCCM) (2015). Google Scholar
Digital Library
- R. Rubinstein. 2009. Omp-Box v10. (2009).Google Scholar
- Hyperspectral Remote Sensing Dataset Salina. 2014. Retrieved 2014 from http://www.ehu.es/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes.Google Scholar
- Avi Septimus and Raphael Steinberg. 2010. Compressive sampling hardware reconstruction. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 3316--3319.Google Scholar
Cross Ref
- Anatoli Sergyienko and Oleg Maslennikov. 2002. Implementation of givens QR-decomposition in FPGA. In Parallel Processing and Applied Mathematics. Springer, 458--465. Google Scholar
Digital Library
- Hyperspectral Dataset Stanford. 2014. Retrieved 2014 from http://scien.stanford.edu/index.php/landscapes.Google Scholar
- Jerome L. V. M. Stanislaus and Tinoosh Mohsenin. 2012. High performance compressive sensing reconstruction hardware with QRD process. In Proceedings of the 2012 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 29--32.Google Scholar
- Jerome L. V. M. Stanislaus and Tinoosh Mohsenin. 2013. Low-complexity FPGA implementation of compressive sensing reconstruction. In Proceedings of the 2013 International Conference on Computing, Networking and Communications (ICNC). IEEE, 671--675. Google Scholar
Digital Library
- Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (1996), 267--288.Google Scholar
- Wei Zhang, Vaughn Betz, and Jonathan Rose. 2012. Portable and scalable FPGA-based acceleration of a direct linear system solver. ACM Trans. Reconfig. Technol. Syst. 5, 1 (2012), 6. Google Scholar
Digital Library
- Daniel Zinn, Quinn Hart, Timothy McPhillips, Bertram Ludascher, Yogesh Simmhan, Michail Giakkoupis, and Viktor K. Prasanna. 2011. Towards reliable, performant workflows for streaming-applications on cloud platforms. In Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE Computer Society, 235--244. Google Scholar
Digital Library
- Hui Zou, Trevor Hastie, and Robert Tibshirani. 2006. Sparse principal component analysis. J. Comput. Graph. Stat. 15, 2 (2006), 265--286.Google Scholar
Cross Ref
Index Terms
Automated Real-Time Analysis of Streaming Big and Dense Data on Reconfigurable Platforms
Recommendations
SSketch: An Automated Framework for Streaming Sketch-Based Analysis of Big Data on FPGA
FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing MachinesThis paper proposes SSketch, a novel automated computing framework for FPGA-based online analysis of big data with dense (non-sparse) correlation matrices. SSketch targets streaming applications where each data sample can be processed only once and ...
A Scalable Heterogeneous Dataflow Architecture For Big Data Analytics Using FPGAs (Abstract Only)
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysDue to rapidly expanding data size, there is increasing need for scalable, high-performance, and low-energy frameworks for large- scale data computation. We build a dataflow architecture that harnesses FPGA resources within a distributed analytics ...
Accelerating Big Data Analytics Using FPGAs
FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing MachinesEmerging big data analytics applications require a significant amount of server computational power. As chips are hitting power limits, computing systems are moving away from general-purpose designs and toward greater specialization. Hardware ...






Comments