Abstract
We propose a variant of Principal Component Analysis (PCA) that is suited for real-time applications. In the real-time version of the PCA problem, we maintain a window over the most recent data and project every incoming row of data into a lower-dimensional subspace, which we generate as the output of the model. The goal is to reduce the reconstruction error of the output from the input and to retain major components pertaining to previous distributions of the data. We use the reconstruction error as the termination criteria to update the eigenspace as new data arrives. We then propose two variants of this algorithm that are progressively more time efficient. To verify whether our proposed model can capture the essence of the changing distribution of large datasets in real time, we have implemented the algorithms and compared performance against carefully designed simulations that change distributions of data sources over time in a controllable manner. Furthermore, we have demonstrated that proposed algorithms can capture the changing distributions of real-life datasets by running simulations on datasets from a variety of real-time applications, e.g., localization, activity recognition, customer expenditure, and so forth. Results show that straightforward modifications to convert PCA to use a sliding window of datasets do not work because of the difficulties associated with determination of optimal window size. Instead, we propose algorithmic enhancements that rely on spectral analysis to improve dimensionality reduction. Results show that our methods can successfully capture the changing distribution of data in a real-time scenario, thus enabling real-time PCA.
- Nuno Abreu, Gonçalo Costa, and Fernandes Marques. 2011. Analise do Perfil do Cliente Recheio e Desenvolvimento de um Sistema Promocional. Ph.D. Dissertation.Google Scholar
- Stefan Aeberhard, Danny Coomans, and Olivier de Vel. 1992. The classification performance of RDA. Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland, Tech. Rep. (1992), 92--01.Google Scholar
- S. Aeberhard, D. Coomans, and O. De Vel. 1992. Comparison of classifiers in high dimensional settings. Dept. Math. Statist., James Cook Univ., North Queensland, Australia, Tech. Rep. (1992), 92–02.Google Scholar
- Charu C. Aggarwal. 2003. A framework for diagnosing changes in evolving data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM, 575--586.Google Scholar
Digital Library
- Kerem Altun and Billur Barshan. 2010. Human activity recognition using inertial/magnetic sensor units. In International Workshop on Human Behavior Understanding. Springer, 38--51.Google Scholar
Digital Library
- Kerem Altun, Billur Barshan, and Orkun Tunçel. 2010. Comparative study on classifying human activities with miniature inertial and magnetic sensors. Pattern Recognition 43, 10 (2010), 3605--3620.Google Scholar
Digital Library
- Cédric Archambeau and Francis R. Bach. 2009. Sparse probabilistic projections. In Advances in Neural Information Processing Systems. 73--80.Google Scholar
- Matej Artac, Matjaz Jogan, and Ales Leonardis. 2002. Incremental PCA for on-line visual learning and recognition. In Proceedings of the 16th International Conference on Pattern Recognition, 2002, Vol. 3. IEEE, 781--784.Google Scholar
Cross Ref
- Kirk Baker. 2005. Singular value decomposition tutorial. The Ohio State University 24 (2005).Google Scholar
- Billur Barshan and Murat Cihan Yüksek. 2014. Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units. Computer Journal 57, 11 (2014), 1649--1667.Google Scholar
Cross Ref
- Jean-Patrick Baudry, Margarida Cardoso, Gilles Celeux, Maria José Amorim, and Ana Sousa Ferreira. 2012. Enhancing the selection of a model-based clustering with external qualitative variables. arXiv preprint arXiv:1211.0437 (2012).Google Scholar
- Mikhail Belkin and Partha Niyogi. 2003. Using manifold structure for partially labeled classification. In Advances in Neural Information Processing Systems. 953--960.Google Scholar
- Rajen B. Bhatt and M. Gopal. 2008. FRCT: Fuzzy-rough classification trees. Pattern Analysis and Applications 11, 1 (2008), 73--88.Google Scholar
Digital Library
- Anil Bhattacharyya. 1943. On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society 35 (1943), 99--109.Google Scholar
- Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM, 443--448.Google Scholar
Cross Ref
- Avrim L. Blum and Pat Langley. 1997. Selection of relevant features and examples in machine learning. Artificial Intelligence 97, 1–2 (1997), 245--271.Google Scholar
Digital Library
- Christos Boutsidis, Dan Garber, Zohar Karnin, and Edo Liberty. 2015. Online principal components analysis. Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, 887--901.Google Scholar
- Pierluigi Casale, Oriol Pujol, and Petia Radeva. 2012. BeaStreamer-v0. 1: a new platform for Multi-Sensors Data Acquisition in Wearable Computing Applications.Google Scholar
- Pierluigi Casale, Oriol Pujol, and Petia Radeva. 2011. Human activity recognition from accelerometer data using a wearable device. In Iberian Conference on Pattern Recognition and Image Analysis. Springer, 289--296.Google Scholar
Cross Ref
- Pierluigi Casale, Oriol Pujol, and Petia Radeva. 2012. Personalization and user verification in wearable systems using biometric walking patterns. Personal and Ubiquitous Computing 16, 5 (2012), 563--580.Google Scholar
Digital Library
- Kenneth L. Clarkson and David P. Woodruff. 2009. Numerical linear algebra in the streaming model. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing. ACM, 205--214.Google Scholar
- Patricia Cohen, Stephen G. West, and Leona S. Aiken. 2014. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Psychology Press.Google Scholar
- Belur V. Dasarathy. 1980. Nosing around the neighborhood: A new system structure and classification rule for recognition in partially exposed environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 1 (1980), 67--71.Google Scholar
- Claudio De Stefano, Francesco Fontanella, Marilena Maniaci, and Alessandra Scotto di Freca. 2011. A method for scribe distinction in medieval manuscripts using page layout features. In International Conference on Image Analysis and Processing. Springer, 393--402.Google Scholar
- Claudio De Stefano, Marilena Maniaci, Francesco Fontanella, and A. Scotto di Freca. 2018. Reliable writer identification in medieval manuscripts through page layout features: The “Avila” Bible case. Engineering Applications of Artificial Intelligence 72 (2018), 99--110.Google Scholar
- Jamie DeCoster. 1998. Overview of factor analysis.Google Scholar
- Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.Google Scholar
- Thomas G. Dietterich and Ghulum Bakiri. 1991. Error-correcting output codes: A general method for improving multiclass inductive learning programs. In Association for the Advancement of Artificial Intelligence. Citeseer, 572--577.Google Scholar
- Thomas G. Dietterich and Ghulum Bakiri. 1994. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2 (1994), 263--286.Google Scholar
Cross Ref
- Chris Ding and Xiaofeng He. 2004. K-means clustering via principal component analysis. In Proceedings of the 21st International Conference on Machine Learning. ACM, 29.Google Scholar
- Richard O. Duda, Peter E. Hart, and David G. Stork. 2012. Pattern Classification. John Wiley 8 Sons.Google Scholar
- George H. Dunteman. 1989. Principal component analysis. Quantitative applications in the social sciences series (vol. 69). Sage.Google Scholar
- Carl Eckart and Gale Young. 1936. The approximation of one matrix by another of lower rank. Psychometrika 1, 3 (1936), 211--218.Google Scholar
Cross Ref
- Tarek Elgamal, Maysam Yabandeh, Ashraf Aboulnaga, Waleed Mustafa, and Mohamed Hefeeda. 2015. sPCA: Scalable principal component analysis for big data on distributed platforms. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 79--91.Google Scholar
- Brian S. Everitt and Graham Dunn. 2001. Applied Multivariate Data Analysis. Vol. 2. Wiley Online Library.Google Scholar
- Weiguo Fan, Michael D. Gordon, and Praveen Pathak. 2005. Effective profiling of consumer information retrieval needs: A unified framework and empirical comparison. Decision Support Systems 40, 2 (2005), 213--233.Google Scholar
Digital Library
- Mark Fanty and Ronald Cole. 1991. Spoken letter recognition. In Advances in Neural Information Processing Systems. 220--226.Google Scholar
- Ronald A. Fisher. 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 2 (1936), 179--188.Google Scholar
Cross Ref
- Richard O. Duda, Peter E. Hart, and David G. Stork. 1973. Pattern Classification and Scene Analysis, Vol. 3. Wiley New York.Google Scholar
- Rainer Hoch. 1994. Using IR techniques for text classification in document analysis. In Special Interest Group on Information Retrieval (SIGIR’94). Springer, 31--40.Google Scholar
- Michael Holmes, Alexander Gray, and Charles Isbell. 2007. Fast SVD for large-scale matrices. In Workshop on Efficient Machine Learning at NIPS, Vol. 58. 249--252.Google Scholar
- Ian Jolliffe. 2011. Principal Component Analysis. Springer.Google Scholar
- Ian T. Jolliffe. 1990. Principal component analysis: A beginner’s guide—I. Introduction and application. Weather 45, 10 (1990), 375--382.Google Scholar
Cross Ref
- Thomas Kailath. 1967. The divergence and Bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technology 15, 1 (1967), 52--60.Google Scholar
Cross Ref
- Zohar Karnin and Edo Liberty. 2015. Online PCA with spectral bounds. In Conference on Learning Theory. 1129--1140.Google Scholar
- Ron Kohavi and George H. John. 1997. Wrappers for feature subset selection. Artificial Intelligence 97, 1–2 (1997), 273--324.Google Scholar
Digital Library
- Daphne Koller and Mehran Sahami. 1996. Toward Optimal Feature Selection. Technical Report. Stanford InfoLab.Google Scholar
- Solomon Kullback and Richard A. Leibler. 1951. On information and sufficiency. Annals of Mathematical Statistics 22, 1 (1951), 79--86.Google Scholar
Cross Ref
- Ludmila I. Kuncheva and William J. Faithfull. 2014. PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Transactions on Neural Networks and Learning Systems 25, 1 (2014), 69--80.Google Scholar
Cross Ref
- Wenke Lee, Salvatore J. Stolfo, and Kui W. Mok. 1999. A data mining framework for building intrusion detection models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy, 1999. IEEE, 120--132.Google Scholar
- David D. Lewis. 1992. Feature selection and feature extraction for text categorization. In Proceedings of the Workshop on Speech and Natural Language. Association for Computational Linguistics, 212--217.Google Scholar
Digital Library
- Haifeng Li, Tao Jiang, and Keshu Zhang. 2004. Efficient and robust feature extraction by maximum margin criterion. In Advances in Neural Information Processing Systems. 97--104.Google Scholar
- Quanzhi Li, Armineh Nourbakhsh, Sameena Shah, and Xiaomo Liu. 2017. Real-time novel event detection from social media. In IEEE 33rd International Conference on Data Engineering (ICDE’17). IEEE, 1129--1139.Google Scholar
- Yongmin Li, L-Q Xu, Jason Morphett, and Richard Jacobs. 2003. An integrated algorithm of incremental and robust PCA. In Proceedings of the 2003 International Conference on Image Processing (ICIP ’03), Vol. 1. IEEE, I–245.Google Scholar
- Daw-Tung Lin. 2006. Facial expression classification using PCA and hierarchical radial basis function network. Journal of Information Science and Engineering 22, 5 (2006), 1033--1046.Google Scholar
- Raul H. C. Lopes. 2011. Kolmogorov-Smirnov test. In International Encyclopedia of Statistical Science. Springer, 718--720.Google Scholar
- Moutinho Luiz and Huarng Kun-huang. 2015. Quantitative Modelling in Marketing and Management. World Scientific.Google Scholar
- Prasanta Chandra Mahalanobis. 1936. On the generalized distance in statistics. National Institute of Science of India.Google Scholar
- Aleix M. Martínez and Avinash C. Kak. 2001. PCA versus lDA. IEEE Transactions on Pattern Analysis 8 Machine Intelligence 2 (2001), 228--233.Google Scholar
Digital Library
- Michael Mathioudakis and Nick Koudas. 2010. Twittermonitor: Trend detection over the Twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 1155--1158.Google Scholar
- Kevin Meagher, David Loiselle, and Rodger Koopman. 2012. Real time microgrid power analytics portal for mission critical power systems. US Patent 8,321,194.Google Scholar
- Stuart E. Middleton, Lee Middleton, and Stefano Modafferi. 2014. Real-time crisis mapping of natural disasters using social media. IEEE Intelligent Systems 29, 2 (2014), 9--17.Google Scholar
Cross Ref
- M. Nikulin. 2001. Hellinger distance. In Hazewinkel, M. (Ed.), Encyclopedia of Mathematics. Springer, Berlin. doi 10 (2001), 1361684--1361686.Google Scholar
- Erkki Oja and Juha Karhunen. 1985. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Mathematical Analysis and Applications 106, 1 (1985), 69--84.Google Scholar
Cross Ref
- Cam Nugent (originator). [n.d.]. S8P 500 stock data. Retrieved from https://www.kaggle.com/camnugent/sandp500.Google Scholar
- G. Chaudhuri (originator). [n.d.]. Bhattacharyya distance. Retrieved from https://www.encyclopediaofmath.org/index.php/Bhattacharyya_distance.Google Scholar
- Nhathai Phan, Soon Ae Chun, Manasi Bhole, and James Geller. 2017. Enabling real-time drug abuse detection in tweets. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE’17). IEEE, 1510--1514.Google Scholar
- Daniel Preotiuc-Pietro, Sina Samangooei, Trevor Cohn, Nicholas Gibbins, and Mahesan Niranjan. 2012. Trendminer: An architecture for real time analysis of social media text. In Sixth International Association for the Advancement of Artificial Intelligence Conference on Weblogs and Social Media.Google Scholar
- Abdulhakim A. Qahtan, Basma Alharbi, Suojin Wang, and Xiangliang Zhang. 2015. A PCA-based change detection framework for multidimensional data streams: Change detection in multidimensional data streams. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 935--944.Google Scholar
- Jayant G. Rohra, Boominathan Perumal, Swathi Jamjala Narayanan, Priya Thakur, and Rajen B. Bhatt. 2017. User localization in an indoor environment using fuzzy hybrid of particle swarm optimization 8 gravitational search algorithm with neural networks. In Proceedings of 6th International Conference on Soft Computing for Problem Solving. Springer, 286--295.Google Scholar
- Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323--2326.Google Scholar
Cross Ref
- Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. ACM, 851--860.Google Scholar
Digital Library
- Terence D. Sanger. 1989. Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks 2, 6 (1989), 459--473.Google Scholar
Cross Ref
- Tamas Sarlos. 2006. Improved approximation algorithms for large matrices via random projections. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06). IEEE, 143--152.Google Scholar
Digital Library
- Lindsay I. Smith. 2002. A Tutorial on Principal Components Analysis. Technical Report.Google Scholar
- Jun-ichi Takeuchi and Kenji Yamanishi. 2006. A unifying framework for detecting outliers and change points from time series. IEEE Transactions on Knowledge and Data Engineering 18, 4 (2006), 482--492.Google Scholar
Digital Library
- Md Mehrab Tanjim and Muhammad Abdullah Adnan. 2018. sSketch: A scalable sketching technique for PCA in the cloud. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 574--582.Google Scholar
- Joshua B. Tenenbaum, Vin De Silva, and John C. Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 5500 (2000), 2319--2323.Google Scholar
Cross Ref
- Michael E. Tipping and Christopher M. Bishop. 1999. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61, 3 (1999), 611--622.Google Scholar
Cross Ref
- Satosi Watanabe and Nikhil Pakvasa. 1973. Subspace method of pattern recognition. In Proc. 1st IJCPR. 25--32.Google Scholar
- Andrew R. Webb. 2003. Statistical Pattern Recognition. John Wiley 8 Sons.Google Scholar
- Zhewei Wei, Xuancheng Liu, Feifei Li, Shuo Shang, Xiaoyong Du, and Ji-Rong Wen. 2016. Matrix sketching over sliding windows. In Proceedings of the 2016 International Conference on Management of Data. ACM, 1465--1480.Google Scholar
- Juyang Weng, Yilu Zhang, and Wey-Shiuan Hwang. 2003. Candid covariance-free incremental principal component analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 8 (2003), 1034--1040.Google Scholar
Digital Library
- Worldometers and 7 Billion World. [n.d.]. Internet Live Stats. Retrieved from http://www.internetlivestats.com/one-second/#tweets-band.Google Scholar
- Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, Qiansheng Cheng, Weiguo Fan, and Wei-Ying Ma. 2005. OCFS: optimal orthogonal centroid feature selection for text categorization. In Proceedings of the 28th annual international ACM Special Interest Group on Information Retrieval(SIGIR) Conference on Research and Development in Information Retrieval. ACM, 122–129.Google Scholar
- Yiming Yang and Jan O. Pedersen. 1997. A comparative study on feature selection in text categorization. In ICML, Vol. 97. 412--420.Google Scholar
Digital Library
Index Terms
Real-Time Principal Component Analysis
Recommendations
Geometry-aware principal component analysis for symmetric positive definite matrices
Symmetric positive definite (SPD) matrices in the form of covariance matrices, for example, are ubiquitous in machine learning applications. However, because their size grows quadratically with respect to the number of variables, high-dimensionality can ...
A new discriminant principal component analysis method with partial supervision
Principal component analysis (PCA) is one of the most widely used unsupervised dimensionality reduction methods in pattern recognition. It preserves the global covariance structure of data when labels of data are not available. However, in many ...
The connections between principal component analysis and dimensionality reduction methods of manifolds
ICIC'11: Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligenceIsometric feature mapping (ISOMAP), locally linear embedding (LLE) and Laplacian eigenmaps (LE) are recently proposed nonlinear dimensionality reduction methods of manifolds. When these methods are satisfied with some specific constraints, some hidden ...






Comments