Abstract
Most state-of-the-art machine-learning (ML) algorithms do not consider the computational constraints of implementing the learned model on embedded devices. These constraints are, for example, the limited depth of the arithmetic unit, the memory availability, or the battery capacity. We propose a new learning framework, the Algorithmic Risk Minimization (ARM), which relies on Algorithmic-Stability, and includes these constraints inside the learning process itself. ARM allows one to train advanced resource-sparing ML models and to efficiently deploy them on smart embedded systems. Finally, we show the advantages of our proposal on a smartphone-based Human Activity Recognition application by comparing it to a conventional ML approach.
- M. F. A. Abdullah, A. F. P. Negara, M. S. Sayeed, D. J. Choi, and K. S. Muthu. 2012. Classification algorithms in human activity recognition using smartphones. International Journal of Computer and Information Engineering 6 (2012), 77--84.Google Scholar
- E. Alba, D. Anguita, A. Ghio, and S. Ridella. 2008. Using variable neighborhood search to improve the support vector machine performance in embedded automotive applications. In IEEE International Joint Conference on Neural Networks.Google Scholar
- D. Anguita, A. Boni, and S. Ridella. 2003. A digital architecture for support vector machines: Theory, algorithm, and FPGA implementation. IEEE Transactions on Neural Networks 14, 5 (2003), 993--1009. Google Scholar
Digital Library
- D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz. 2013. Energy efficient smartphone-based activity recognition using fixed-point arithmetic. Journal of Universal Computer Science 19 (2013), 1295--1314.Google Scholar
- D. Anguita, A. Ghio, L. Oneto, and S. Ridella. 2011. Selecting the hypothesis space for improving the generalization ability of support vector machines. In IEEE International Joint Conference on Neural Networks.Google Scholar
- D. Anguita, A. Ghio, L. Oneto, and S. Ridella. 2012. In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Transactions on Neural Networks and Learning Systems 23, 9 (2012), 1390--1406.Google Scholar
Cross Ref
- D. Anguita, A. Ghio, L. Oneto, and S. Ridella. 2013. A support vector machine classifier from a bit-constrained, sparse and localized hypothesis space. In International Joint Conference on Neural Networks.Google Scholar
- D. Anguita, A. Ghio, L. Oneto, and S. Ridella. 2014. Smartphone battery saving by bit-based hypothesis spaces and local rademacher complexities. In International Joint Conference on Neural Networks.Google Scholar
- D. Anguita, A. Ghio, S. Pischiutta, and S. Ridella. 2008. A support vector machine with integer parameters. Neurocomputing 72, 1 (2008), 480--489. Google Scholar
Digital Library
- K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/ml.Google Scholar
- P. L. Bartlett, S. Boucheron, and G. Lugosi. 2002a. Model selection and error estimation. Machine Learning 48, 1--3 (2002), 85--113. Google Scholar
Digital Library
- P. L. Bartlett, O. Bousquet, and S. Mendelson. 2002b. Localized Rademacher complexities. In Computational Learning Theory. Google Scholar
Digital Library
- P. L. Bartlett, O. Bousquet, and S. Mendelson. 2005. Local Rademacher complexities. Annals of Statistics 33, 4 (2005), 1497--1537.Google Scholar
Cross Ref
- P. L. Bartlett and S. Mendelson. 2003. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3 (2003), 463--482. Google Scholar
Digital Library
- M. Belkin and P. Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 6 (2003), 1373--1396. Google Scholar
Digital Library
- J. Bergstra and Y. Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 1 (2012), 281--305. Google Scholar
Digital Library
- A. L. Blum and P. Langley. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 1 (1997), 245--271. Google Scholar
Digital Library
- O. Bousquet and A. Elisseeff. 2002. Stability and generalization. Journal of Machine Learning Research 2 (2002), 499--526. Google Scholar
Digital Library
- S. Boyd and L. Vandenberghe. 2009. Convex Optimization. Cambridge University Press. Google Scholar
Digital Library
- G. Casella and R. L. Berger. 2002. Statistical Inference. Duxbury, Pacific Grove, CA.Google Scholar
- O. Catoni. 2007. PAC-Bayesian supervised classification: The thermodynamics of statistical learning. arXiv preprint arXiv:0712.0248 (2007).Google Scholar
- A. Chin, B. Xu, H. Wang, L. Chang, H. Wang, and L. Zhu. 2013. Connecting people through physical proximity and physical resources at a conference. ACM Transactions on Intelligent Systems and Technology 4, 3 (2013), 50. Google Scholar
Digital Library
- D. J. Cook and S. K. Das. 2012. Pervasive computing at scale: Transforming the state of the art. Pervasive Mobile Computing 8 (2012), 22--35. Google Scholar
Digital Library
- G. B. Dantzig. 1998. Linear Programming and Extensions. Princeton University Press.Google Scholar
- E. De Vito, L. Rosasco, A. Caponnetto, U. D. Giovannini, and F. Odone. 2005. Learning from examples as an inverse problem. Journal of Machine Learning Research 6 (2005), 883--904. Google Scholar
Digital Library
- L. Devroye, L. Györfi, and G. Lugosi. 1996. A Probabilistic Theory of Pattern Recognition. Springer.Google Scholar
- R. Dietrich, M. Opper, and H. Sompolinsky. 1999. Statistical mechanics of support vector networks. Physical Review Letters 82, 14 (1999), 2975.Google Scholar
Cross Ref
- F. Dinuzzo, M. Neve, G. De Nicolao, and U. P. Gianazza. 2007. On the representer theorem and equivalent degrees of freedom of SVR. Journal of Machine Learning Research 8, 10 (2007), 2467--2495. Google Scholar
Digital Library
- F. Dinuzzo and B. Schölkopf. 2012. The representer theorem for Hilbert spaces: A necessary and sufficient condition. In Advances in Neural Information Processing Systems.Google Scholar
- M. G. Epitropakis, V. P. Plagianakos, and M. N. Vrahatis. 2010. Hardware-friendly higher-order neural network training using distributed evolutionary algorithms. Applied Soft Computing 10, 2 (2010), 398--408. Google Scholar
Digital Library
- R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9 (2008), 1871--1874. Google Scholar
Digital Library
- R. A. Felicity, A. Eliathamby, H. L. Nigel, and G. C. Branko. 2006. Classification of a known sequence of motions and postures from accelerometrydata using adapted Gaussian mixture models. Physiological Measurement 27 (2006), 935.Google Scholar
Cross Ref
- K. Fukunaga and D. M. Hummels. 1989. Leave-one-out procedures for nonparametric error estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, 4 (1989), 421--423. Google Scholar
Digital Library
- A. Ghio and S. Pischiutta. 2007. A support vector machine based pedestrian recognition system on resource-limited hardware architectures. In Research in Microelectronics and Electronics Conference (PRIME’07).Google Scholar
- P. D. Grünwald, I. J. Myung, and M. A. Pitt. 2005. Advances in Minimum Description Length: Theory and Applications. MIT Press.Google Scholar
- M. A. Hanson, H. C. Powell Jr, A. T. Barth, and J. Lach. 2012. Application-focused energy-fidelity scalability for wireless motion-based health assessment. ACM Transactions on Embedded Computing Systems 11, S2 (2012), 50. Google Scholar
Digital Library
- W. Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 301 (1963), 13--30.Google Scholar
Cross Ref
- IBM. 2014. User-Manual CPLEX 12.6. IBM Software Group. (2014).Google Scholar
- K. Irick, M. DeBole, V. Narayanan, and A. Gayasen. 2008a. A hardware efficient support vector machine architecture for FPGA. In International Symposium on Field-Programmable Custom Computing Machines. Google Scholar
Digital Library
- K. Irick, M. DeBole, V. Narayanan, and A. Gayasen. 2008b. A hardware efficient support vector machine architecture for FPGA. In International Symposium on Field-Programmable Custom Computing Machines. Google Scholar
Digital Library
- V. V. Ivanov. 1976. The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integral Equations. Springer.Google Scholar
- T. Joachims. 1999. Making Large Scale SVM Learning Practical. Technical Report. Universität Dortmund.Google Scholar
- G. H. John, R. Kohavi, and K. Pfleger. 1994. Irrelevant features and the subset selection problem. In International Conference on Machine Learning.Google Scholar
- P. Klesk and M. Korzen. 2011. Sets of approximating functions with finite Vapnik--Chervonenkis dimension for nearest-neighbors algorithms. Pattern Recognition Letters 32, 14 (2011), 1882--1893. Google Scholar
Digital Library
- V. Koltchinskii. 2001. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory 47, 5 (2001), 1902--1914. Google Scholar
Digital Library
- V. Koltchinskii. 2011. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Springer.Google Scholar
- E. S. Larsen and D. McAllister. 2001. Fast matrix multiplies using graphics hardware. In ACM/IEEE Conference on Supercomputing. Google Scholar
Digital Library
- M. M. S. Lee, S. S. Keerthi, C. J. Ong, and D. DeCoste. 2004. An efficient method for computing leave-one-out error in support vector machines with gaussian kernels. IEEE Transactions on Neural Networks 15, 3 (2004), 750--757. Google Scholar
Digital Library
- S. W. Lee, S. W. Lee, and H. C. Jung. 2003. Real-time implementation of face recognition algorithms on DSP chip. In Audio-and Video-Based Biometric Person Authentication. Google Scholar
Digital Library
- B. Lesser, M. Mücke, and W. N. Gansterer. 2011. Effects of reduced precision on floating-point SVM classification accuracy. Procedia Computer Science 4 (2011), 508--517.Google Scholar
- G. Lever, F. Laviolette, and J. Shawe-Taylor. 2010. Distribution-dependent PAC-Bayes priors. In Algorithmic Learning Theory. Google Scholar
Digital Library
- M. Li and P. M. B. Vitányi. 2009. An Introduction to Kolmogorov Complexity and Its Applications. Springer. Google Scholar
Digital Library
- Q. Li and A. Bermak. 2011. A low-power hardware-friendly binary decision tree classifier for gas identification. Journal of Low Power Electronics and Applications 1, 1 (2011), 45--58.Google Scholar
Cross Ref
- Z. Liu, S. Lin, and M. T. Tan. 2010. Sparse support vector machines with L_{p} penalty for biomarker identification. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7, 1 (2010), 100--107. Google Scholar
Digital Library
- T. Luo, L. O. Hall, D. B. Goldgof, and A. Remsen. 2005. Bit reduction support vector machine. In IEEE International Conference on Data Mining. Google Scholar
Digital Library
- J. Manikandan, B. Venkataramani, and V. Avanthi. 2009. FPGA implementation of support vector machine based isolated digit recognition system. In IEEE International Conference on VLSI Design. Google Scholar
Digital Library
- C. McDiarmid. 1989. On the method of bounded differences. Surveys in Combinatorics 141, 1 (1989), 148--188.Google Scholar
- N. Meinshausen and P. Bühlmann. 2010. Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 4 (2010), 417--473.Google Scholar
Cross Ref
- J. Mercer. 1909. Functions of positive and negative type, and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character (1909), 415--446.Google Scholar
- B. L. Milenova, J. S. Yarmus, and M. M. Campos. 2005. SVM in oracle database 10g: Removing the barriers to widespread adoption of support vector machines. In International Conference on Very Large Data Bases. Google Scholar
Digital Library
- V. A. Morozov, Z. Nashed, and A. B. Aries. 1984. Methods for Solving Incorrectly Posed Problems. Springer, New York.Google Scholar
- S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. 2006. Learning theory: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics 25, 1 (2006), 161--193.Google Scholar
Cross Ref
- S. Mukherjee, P. Tamayo, S. Rogers, R. Rifkin, A. Engle, C. Campbell, T. R. Golub, and J. P. Mesirov. 2003. Estimating dataset size requirements for classifying DNA microarray data. Journal of Computational Biology 10, 2 (2003), 119--142.Google Scholar
Cross Ref
- G. L. Nemhauser and L. A. Wolsey. 1988. Integer and Combinatorial Optimization. Wiley, New York. Google Scholar
Digital Library
- H. Noshadi, F. Dabiri, S. Meguerdichian, M. Potkonjak, and M. Sarrafzadeh. 2013. Behavior-oriented data resource management in medical sensing systems. ACM Transactions on Sensor Networks 9, 2 (2013), 12. Google Scholar
Digital Library
- L. Oneto, A. Ghio, S. Ridella, and D. Anguita. 2015. Fully empirical and data-dependent stability-based bounds. IEEE Transactions on Cybernetics 45, 9 (2015), 1913--1926.Google Scholar
Cross Ref
- L. Oneto, A. Ghio, S. Ridella, J. L. Reyes-Ortiz, and D. Anguita. 2014. Out-of-sample error estimation: The blessing of high dimensionality. In IEEE International Conference on Data Mining, International Workshop on High Dimensional Data Mining.Google Scholar
- M. Opper. 1995. Statistical mechanics of learning: Generalization. In The Handbook of Brain Theory and Neural Networks (1995), 922--925. Google Scholar
Digital Library
- M. Opper, W. Kinzel, J. Kleinz, and R. Nehl. 1990. On the ability of the optimal perceptron to generalise. Journal of Physics A: Mathematical and General 23, 11 (1990), L581.Google Scholar
Cross Ref
- C. Orsenigo and C. Vercellis. 2004. Discrete support vector decision trees via tabu search. Computational Statistics & Data Analysis 47, 2 (2004), 311--322.Google Scholar
Cross Ref
- C. H. Papadimitriou and K. Steiglitz. 1998. Combinatorial Optimization: Algorithms and Complexity. Courier Dover Publications.Google Scholar
Digital Library
- B. Parhami. 2009. Computer Arithmetic: Algorithms and Hardware Designs. Oxford University Press. Google Scholar
Digital Library
- E. Parrado-Hernández, A. Ambroladze, J. Shawe-Taylor, and S. Sun. 2012. PAC-Bayes bounds with data dependent priors. Journal of Machine Learning Research 13, 1 (2012), 3507--3531. Google Scholar
Digital Library
- O. Pina-Ramfrez, R. Valdes-Cristerna, and O. Yanez-Suarez. 2006. An FPGA implementation of linear kernel support vector machines. In IEEE International Conference on Reconfigurable Computing and FPGA’s.Google Scholar
- V. P. Plagianakos and M. N. Vrahatis. 2002. Parallel evolutionary training algorithms for “hardware-friendly” neural networks. Natural Computing 1, 2--3 (2002), 307--322. Google Scholar
Digital Library
- T. Poggio, S. Mukherjee, R. Rifkin, A. Rakhlin, and A. Verri. 2002. b. In Uncertainty in Geometric Computations.Google Scholar
- T. Poggio, R. Rifkin, S. Mukherjee, and P. Niyogi. 2004. General conditions for predictivity in learning theory. Nature 428, 6981 (2004), 419--422.Google Scholar
- L. Rosasco, E. Vito, A. Caponnetto, M. Piana, and A. Verri. 2004. Are loss functions all the same? Neural Computation 16, 5 (2004), 1063--1076. Google Scholar
Digital Library
- S. T. Roweis and L. K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323--2326.Google Scholar
Cross Ref
- B. Schölkopf. 2001. The kernel trick for distances. In Advances in Neural Information Processing Systems (2001).Google Scholar
Digital Library
- B. Schölkopf, R. Herbrich, and A. J. Smola. 2001. A generalized representer theorem. In Computational Learning Theory.Google Scholar
- A. Schrijver. 2003. Combinatorial Optimization: Polyhedra and Efficiency. Springer.Google Scholar
- S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan. 2010. Learnability, stability and uniform convergence. Journal of Machine Learning Research 11 (2010), 2635--2670. Google Scholar
Digital Library
- J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. 1998. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory 44, 5 (1998), 1926--1940. Google Scholar
Digital Library
- J. Shawe-Taylor and N. Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press. Google Scholar
Digital Library
- I. Steinwart. 2005. Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory 51, 1 (2005), 128--142. Google Scholar
Digital Library
- B. Tang, N. Jaggi, H. Wu, and R. Kurkal. 2013. Energy-efficient data redistribution in sensor networks. ACM Transactions on Sensor Networks 9, 2 (2013), 11. Google Scholar
Digital Library
- A. Tarantola. 2005. Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM. Google Scholar
Digital Library
- G. Thatte, M. Li, S. Lee, A. Emken, S. Narayanan, U. Mitra, D. Spruijt-Metz, and M. Annavaram. 2012. Knowme: An energy-efficient multimodal body area network for physical activity monitoring. ACM Transactions on Embedded Computing Systems 11, S2 (2012), 48. Google Scholar
Digital Library
- A. N. Tikhonov, V. I. A. Arsenin, and F. John. 1977. Solutions of Ill-Posed Problems. Winston, Washington, DC.Google Scholar
- V. N. Vapnik. 1998. Statistical Learning Theory. Wiley-Interscience.Google Scholar
- V. N. Vapnik. 1999. An overview of statistical learning theory. IEEE Transactions on Neural Networks 10, 5 (1999), 988--999. Google Scholar
Digital Library
- Wikipedia. 2015. Comparison of smartphones. http://en.wikipedia.org/wiki/Comparison_of_smartphones. (2015).Google Scholar
- P. Zappi, D. Roggen, E. Farella, G. Tröster, and L. Benini. 2012. Network-level power-performance trade-off in wearable activity recognition: A dynamic sensor selection approach. ACM Transactions on Embedded Computing Systems 11, 3 (2012), 68. Google Scholar
Digital Library
- Y. Zheng, L. Capra, O. Wolfson, and H. Yang. 2014. Urban computing: Concepts, methodologies, and applications. ACM Transaction on Intelligent Systems and Technology 6, 2 (2014), 58. Google Scholar
Digital Library
- J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 2004. 1-norm support vector machines. Advances in Neural Information Processing Systems (2004).Google Scholar
- H. Zou and T. Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67, 2 (2005), 301--320.Google Scholar
Cross Ref
- H. Zou, T. Hastie, and R. Tibshirani. 2007. On the “degrees of freedom” of the lasso. Annals of Statistics 35, 5 (2007), 2173--2192.Google Scholar
Cross Ref
Index Terms
Learning Hardware-Friendly Classifiers Through Algorithmic Stability
Recommendations
Hardware acceleration of homogeneous and heterogeneous ensemble classifiers
In this paper a universal reconfigurable computing architecture for hardware implementation of homogeneous and heterogeneous ensemble classifiers composed from decision trees (DTs), artificial neural networks (ANNs), and support vector machines (SVMs) ...
Learning a hyperplane classifier by minimizing an exact bound on the VC dimension1
The VC dimension measures the complexity of a learning machine, and a low VC dimension leads to good generalization. While SVMs produce state-of-the-art learning performance, it is well known that the VC dimension of a SVM can be unbounded; despite good ...
Extending twin support vector machine classifier for multi-category classification problems
Twin support vector machine classifier TWSVM was proposed by Jayadeva et al., which was used for binary classification problems. TWSVM not only overcomes the difficulties in handling the problem of exemplar unbalance in binary classification problems, ...






Comments