Abstract
We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la a Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method-coupled with a novel, non-standard matrix estimation error metric-to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise.
- Emmanuel Abbe and Colin Sandon. 2015a. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on. IEEE, 670--688. Google Scholar
Digital Library
- Emmanuel Abbe and Colin Sandon. 2015b. Recovering communities in the general stochastic block model without knowing the parameters. In Advances in neural information processing systems. Google Scholar
Digital Library
- Emmanuel Abbe and Colin Sandon. 2016. Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap. Advances in neural information processing systems (2016).Google Scholar
- Anish Agarwal, Devavrat Shah, Dennis Shen, and Dogyoon Song. 2018. Supervised Learning in High Dimensions via Matrix Estimation. Working Paper (2018).Google Scholar
- Edo M Airoldi, Thiago B Costa, and Stanley H Chan. 2013. Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. In Advances in Neural Information Processing Systems. 692--700. Google Scholar
Digital Library
- Muhammad J Amjad and Devavrat Shah. 2017. Censored Demand Estimation in Retail. Proceedings of the ACM on Measurement and Analysis of Computing Systems , Vol. 1, 2 (2017), 31. Google Scholar
Digital Library
- Muhammad Jehangir Amjad, Devavrat Shah, and Dennis Shen. 2017. Robust synthetic control. arXiv preprint arXiv:1711.06940 (2017).Google Scholar
- Animashree Anandkumar, Rong Ge, Daniel Hsu, and Sham Kakade. 2013. A tensor spectral approach to learning mixed membership community models. In Conference on Learning Theory. 867--881.Google Scholar
- Oren Anava, Elad Hazan, and Assaf Zeevi. 2015. Online Time Series Prediction with Missing Data. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), David Blei and Francis Bach (Eds.). JMLR Workshop and Conference Proceedings, 2191--2199. http://jmlr.org/proceedings/papers/v37/anava15.pdf Google Scholar
Digital Library
- Leonard E Baum and Ted Petrie. 1966. Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics , Vol. 37, 6 (1966), 1554--1563.Google Scholar
Cross Ref
- Alexandre Belloni, Mathieu Rosenbaum, and Alexandre B Tsybakov. 2017. Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 79, 3 (2017), 939--956.Google Scholar
Cross Ref
- Sergei Bernstein. 1946. The Theory of Probabilities. Gastehizdat Publishing House.Google Scholar
- Dimitris Bertsimas, David Gamarnik, and John N Tsitsiklis. 1999. Estimation of time-varying parameters in statistical models: an optimization approach. Machine Learning , Vol. 35, 3 (1999), 225--245. Google Scholar
Digital Library
- Christian Borgs, Jennifer Chayes, Christina E Lee, and Devavrat Shah. 2017. Thy Friend is My Friend: Iterative Collaborative Filtering for Sparse Matrix Estimation. In Advances in Neural Information Processing Systems. 4718--4729. Google Scholar
Digital Library
- Christian Borgs, Jennifer T Chayes, Henry Cohn, and Shirshendu Ganguly. 2015. Consistent nonparametric estimation for heavy-tailed sparse graphs. arXiv preprint arXiv:1508.06675 (2015).Google Scholar
- Jenkins Box and Reinsel. 1994. Time Series Analysis, Forecasting and Control 3rd ed.). Prentice Hall, Englewood Clifs, NJ. Google Scholar
Digital Library
- Peter J Brockwell and Richard A Davis. 2013. Time series: theory and methods. Springer Science & Business Media.Google Scholar
- Emmanuel J Candès and Terence Tao. 2010. The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory , Vol. 56, 5 (2010), 2053--2080. Google Scholar
Digital Library
- Sourav Chatterjee. 2015. Matrix estimation by universal singular value thresholding. The Annals of Statistics , Vol. 43, 1 (2015), 177--214.Google Scholar
Cross Ref
- Yudong Chen and Martin J Wainwright. 2015. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arXiv preprint arXiv:1509.03025 (2015).Google Scholar
- Zhe Chen and Andrzej Cichocki. 2005. Nonnegative matrix factorization with temporal smoothness and/or spatial decorrelation constraints. In Laboratory for Advanced Brain Signal Processing, RIKEN, Tech. Rep.Google Scholar
- Thomas M Cover. 1966. BEHAVIOR OF SEQUENTIAL PREDICTORS OF BINARY SEQUENCES. Technical Report. DTIC Document.Google Scholar
- A.A.H Damen, P.M.J Van den Hof, and A.K Hajdasinskit. 1982. Approximate realization based upon an alternative to the Hankel matrix: the Page matrix. Systems and Control Letters , Vol. 2, 4 (1982), 202.Google Scholar
Cross Ref
- Abhirup Datta and Hui Zou. 2017. Cocolasso for high-dimensional error-in-variables regression. The Annals of Statistics , Vol. 45, 6 (2017), 2400--2426.Google Scholar
Cross Ref
- Mark A Davenport, Yaniv Plan, Ewout van den Berg, and Mary Wootters. 2014. 1-bit matrix completion. Information and Inference , Vol. 3, 3 (2014), 189--223.Google Scholar
Cross Ref
- William Dunsmuir and PM Robinson. 1981. Estimation of time series models in the presence of missing data. J. Amer. Statist. Assoc. , Vol. 76, 375 (1981), 560--568.Google Scholar
Cross Ref
- James Durbin and Siem Jan Koopman. 2012. Time series analysis by state space methods. Vol. 38. OUP Oxford.Google Scholar
- Meir Feder, Neri Merhav, and Michael Gutman. 1992. Universal prediction of individual sequences. Information Theory, IEEE Transactions on , Vol. 38, 4 (1992), 1258--1270. Google Scholar
Digital Library
- Nina Golyandina, Vladimir Nekrutkin, and Anatoly A Zhigljavsky. 2001. Analysis of time series structure: SSA and related techniques. Chapman and Hall/CRC.Google Scholar
- James Douglas Hamilton. 1994. Time series analysis. Vol. 2. Princeton university press Princeton.Google Scholar
- James Honaker, Gary King, and Matthew Blackwell. 2015. AMELIA II: A Program for Missing Data. https://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdfGoogle Scholar
- Samuel B Hopkins and David Steurer. 2017. Efficient Bayesian estimation from few samples: community detection and related problems. In Foundations of Computer Science (FOCS), 2017 IEEE 58th Annual Symposium on. IEEE, 379--390.Google Scholar
Cross Ref
- Rudolph Emil Kalman et almbox. 1960. A new approach to linear filtering and prediction problems. Journal of basic Engineering , Vol. 82, 1 (1960), 35--45.Google Scholar
Cross Ref
- Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. 2010a. Matrix completion from a few entries. IEEE Transactions on Information Theory , Vol. 56, 6 (2010), 2980--2998. Google Scholar
Digital Library
- Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. 2010b. Matrix completion from noisy entries. Journal of Machine Learning Research , Vol. 11, Jul (2010), 2057--2078. Google Scholar
Digital Library
- Christina E. Lee, Yihua Li, Devavrat Shah, and Dogyoon Song. 2016. Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering. In Advances in Neural Information Processing Systems 29. 2155--2163. Google Scholar
Digital Library
- Yuval Nardi and Alessandro Rinaldo. 2011. Autoregressive process modeling via the lasso procedure. Journal of Multivariate Analysis , Vol. 102, 3 (2011), 528--549. Google Scholar
Digital Library
- Sahand Negahban and Martin J Wainwright. 2011. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics (2011), 1069--1097.Google Scholar
- Loh Po-ling and Martin J Wainwright. 2012. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. The Annals of Statistics , Vol. 40 (2012), 1637--1664.Google Scholar
Cross Ref
- Swati Rallapalli, Lili Qiu, Yin Zhang, and Yi-Chao Chen. 2010. Exploiting temporal stability and low-rank structure for localization in mobile networks. In Proceedings of the sixteenth annual international conference on Mobile computing and networking. ACM, 161--172. Google Scholar
Digital Library
- Benjamin Recht. 2011. A simpler approach to matrix completion. Journal of Machine Learning Research , Vol. 12, Dec (2011), 3413--3430. Google Scholar
Digital Library
- Jorma Rissanen. 1984. Universal coding, information, prediction, and estimation. Information Theory, IEEE Transactions on , Vol. 30, 4 (1984), 629--636. Google Scholar
Digital Library
- David S. Stoffer Robert H. Shumway. 2015. Time Series Analysis and It's Applications 3rd ed.). Blue Printing.Google Scholar
- Jürgen Schmidhuber. 1992. Learning complex, extended sequences using the principle of history compression. Neural Computation , Vol. 4, 2 (1992), 234--242. Google Scholar
Digital Library
- David H Schoellhamer. 2001. Singular spectrum analysis for time series with missing data. Geophysical Research Letters , Vol. 28, 16 (2001), 3187--3190.Google Scholar
Cross Ref
- Y Shen, F Peng, and B Li. 2015. Improved singular spectrum analysis for time series with missing data. Nonlinear Processes in Geophysics , Vol. 22, 4 (2015), 371--376.Google Scholar
Cross Ref
- Paul C Shields. 1998. The interactions between ergodic theory and information theory. In IEEE Transactions on Information Theory. Citeseer. Google Scholar
Digital Library
- Robert H Shumway and David S Stoffer. 1982. An approach to time series smoothing and forecasting using the EM algorithm. Journal of time series analysis , Vol. 3, 4 (1982), 253--264.Google Scholar
Cross Ref
- Grigorios Tsagkatakis, Baltasar Beferull-Lozano, and Panagiotis Tsakalides. 2016. Singular spectrum-based matrix completion for time series recovery and prediction. EURASIP Journal on Advances in Signal Processing , Vol. 2016, 1 (2016), 66.Google Scholar
Cross Ref
- Roman Vershynin. 2010. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027 (2010).Google Scholar
- Christopher Xie, Alex Talk, and Emily Fox. 2016. A Unified Framework for Missing Data and Cold Start Prediction for Time Series Data. In Advances in neural information processing systems Time Series Workshop.Google Scholar
- Fanny Yang, Sivaraman Balakrishnan, and Martin J Wainwright. 2017. Statistical and computational guarantees for the Baum-Welch algorithm. The Journal of Machine Learning Research , Vol. 18, 1 (2017), 4528--4580. Google Scholar
Digital Library
- Hsiang-Fu Yu, Nikhil Rao, and Inderjit S Dhillon. 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. In Advances in neural information processing systems. 847--855. Google Scholar
Digital Library
- Yuan Zhang, Elizaveta Levina, and Ji Zhu. 2015. Estimating network edge probabilities by neighborhood smoothing. arXiv preprint arXiv:1509.08588 (2015).Google Scholar
Index Terms
Model Agnostic Time Series Analysis via Matrix Estimation
Recommendations
Model Agnostic Time Series Analysis via Matrix Estimation
We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make ...
Model Agnostic Time Series Analysis via Matrix Estimation
SIGMETRICS '19: Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer SystemsWe propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make ...
Nonnegative matrix factorization via rank-one downdate
ICML '08: Proceedings of the 25th international conference on Machine learningNonnegative matrix factorization (NMF) was popularized as a tool for data mining by Lee and Seung in 1999. NMF attempts to approximate a matrix with nonnegative entries by a product of two low-rank matrices, also with nonnegative entries. We propose an ...






Comments