skip to main content
research-article

Model Agnostic Time Series Analysis via Matrix Estimation

Authors Info & Claims
Published:21 December 2018Publication History
Skip Abstract Section

Abstract

We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la a Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method-coupled with a novel, non-standard matrix estimation error metric-to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise.

References

  1. Emmanuel Abbe and Colin Sandon. 2015a. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on. IEEE, 670--688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Emmanuel Abbe and Colin Sandon. 2015b. Recovering communities in the general stochastic block model without knowing the parameters. In Advances in neural information processing systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Emmanuel Abbe and Colin Sandon. 2016. Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap. Advances in neural information processing systems (2016).Google ScholarGoogle Scholar
  4. Anish Agarwal, Devavrat Shah, Dennis Shen, and Dogyoon Song. 2018. Supervised Learning in High Dimensions via Matrix Estimation. Working Paper (2018).Google ScholarGoogle Scholar
  5. Edo M Airoldi, Thiago B Costa, and Stanley H Chan. 2013. Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. In Advances in Neural Information Processing Systems. 692--700. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Muhammad J Amjad and Devavrat Shah. 2017. Censored Demand Estimation in Retail. Proceedings of the ACM on Measurement and Analysis of Computing Systems , Vol. 1, 2 (2017), 31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Muhammad Jehangir Amjad, Devavrat Shah, and Dennis Shen. 2017. Robust synthetic control. arXiv preprint arXiv:1711.06940 (2017).Google ScholarGoogle Scholar
  8. Animashree Anandkumar, Rong Ge, Daniel Hsu, and Sham Kakade. 2013. A tensor spectral approach to learning mixed membership community models. In Conference on Learning Theory. 867--881.Google ScholarGoogle Scholar
  9. Oren Anava, Elad Hazan, and Assaf Zeevi. 2015. Online Time Series Prediction with Missing Data. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), David Blei and Francis Bach (Eds.). JMLR Workshop and Conference Proceedings, 2191--2199. http://jmlr.org/proceedings/papers/v37/anava15.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Leonard E Baum and Ted Petrie. 1966. Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics , Vol. 37, 6 (1966), 1554--1563.Google ScholarGoogle ScholarCross RefCross Ref
  11. Alexandre Belloni, Mathieu Rosenbaum, and Alexandre B Tsybakov. 2017. Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 79, 3 (2017), 939--956.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sergei Bernstein. 1946. The Theory of Probabilities. Gastehizdat Publishing House.Google ScholarGoogle Scholar
  13. Dimitris Bertsimas, David Gamarnik, and John N Tsitsiklis. 1999. Estimation of time-varying parameters in statistical models: an optimization approach. Machine Learning , Vol. 35, 3 (1999), 225--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Christian Borgs, Jennifer Chayes, Christina E Lee, and Devavrat Shah. 2017. Thy Friend is My Friend: Iterative Collaborative Filtering for Sparse Matrix Estimation. In Advances in Neural Information Processing Systems. 4718--4729. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Christian Borgs, Jennifer T Chayes, Henry Cohn, and Shirshendu Ganguly. 2015. Consistent nonparametric estimation for heavy-tailed sparse graphs. arXiv preprint arXiv:1508.06675 (2015).Google ScholarGoogle Scholar
  16. Jenkins Box and Reinsel. 1994. Time Series Analysis, Forecasting and Control 3rd ed.). Prentice Hall, Englewood Clifs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Peter J Brockwell and Richard A Davis. 2013. Time series: theory and methods. Springer Science & Business Media.Google ScholarGoogle Scholar
  18. Emmanuel J Candès and Terence Tao. 2010. The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory , Vol. 56, 5 (2010), 2053--2080. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sourav Chatterjee. 2015. Matrix estimation by universal singular value thresholding. The Annals of Statistics , Vol. 43, 1 (2015), 177--214.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yudong Chen and Martin J Wainwright. 2015. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arXiv preprint arXiv:1509.03025 (2015).Google ScholarGoogle Scholar
  21. Zhe Chen and Andrzej Cichocki. 2005. Nonnegative matrix factorization with temporal smoothness and/or spatial decorrelation constraints. In Laboratory for Advanced Brain Signal Processing, RIKEN, Tech. Rep.Google ScholarGoogle Scholar
  22. Thomas M Cover. 1966. BEHAVIOR OF SEQUENTIAL PREDICTORS OF BINARY SEQUENCES. Technical Report. DTIC Document.Google ScholarGoogle Scholar
  23. A.A.H Damen, P.M.J Van den Hof, and A.K Hajdasinskit. 1982. Approximate realization based upon an alternative to the Hankel matrix: the Page matrix. Systems and Control Letters , Vol. 2, 4 (1982), 202.Google ScholarGoogle ScholarCross RefCross Ref
  24. Abhirup Datta and Hui Zou. 2017. Cocolasso for high-dimensional error-in-variables regression. The Annals of Statistics , Vol. 45, 6 (2017), 2400--2426.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mark A Davenport, Yaniv Plan, Ewout van den Berg, and Mary Wootters. 2014. 1-bit matrix completion. Information and Inference , Vol. 3, 3 (2014), 189--223.Google ScholarGoogle ScholarCross RefCross Ref
  26. William Dunsmuir and PM Robinson. 1981. Estimation of time series models in the presence of missing data. J. Amer. Statist. Assoc. , Vol. 76, 375 (1981), 560--568.Google ScholarGoogle ScholarCross RefCross Ref
  27. James Durbin and Siem Jan Koopman. 2012. Time series analysis by state space methods. Vol. 38. OUP Oxford.Google ScholarGoogle Scholar
  28. Meir Feder, Neri Merhav, and Michael Gutman. 1992. Universal prediction of individual sequences. Information Theory, IEEE Transactions on , Vol. 38, 4 (1992), 1258--1270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Nina Golyandina, Vladimir Nekrutkin, and Anatoly A Zhigljavsky. 2001. Analysis of time series structure: SSA and related techniques. Chapman and Hall/CRC.Google ScholarGoogle Scholar
  30. James Douglas Hamilton. 1994. Time series analysis. Vol. 2. Princeton university press Princeton.Google ScholarGoogle Scholar
  31. James Honaker, Gary King, and Matthew Blackwell. 2015. AMELIA II: A Program for Missing Data. https://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdfGoogle ScholarGoogle Scholar
  32. Samuel B Hopkins and David Steurer. 2017. Efficient Bayesian estimation from few samples: community detection and related problems. In Foundations of Computer Science (FOCS), 2017 IEEE 58th Annual Symposium on. IEEE, 379--390.Google ScholarGoogle ScholarCross RefCross Ref
  33. Rudolph Emil Kalman et almbox. 1960. A new approach to linear filtering and prediction problems. Journal of basic Engineering , Vol. 82, 1 (1960), 35--45.Google ScholarGoogle ScholarCross RefCross Ref
  34. Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. 2010a. Matrix completion from a few entries. IEEE Transactions on Information Theory , Vol. 56, 6 (2010), 2980--2998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. 2010b. Matrix completion from noisy entries. Journal of Machine Learning Research , Vol. 11, Jul (2010), 2057--2078. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Christina E. Lee, Yihua Li, Devavrat Shah, and Dogyoon Song. 2016. Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering. In Advances in Neural Information Processing Systems 29. 2155--2163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yuval Nardi and Alessandro Rinaldo. 2011. Autoregressive process modeling via the lasso procedure. Journal of Multivariate Analysis , Vol. 102, 3 (2011), 528--549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sahand Negahban and Martin J Wainwright. 2011. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics (2011), 1069--1097.Google ScholarGoogle Scholar
  39. Loh Po-ling and Martin J Wainwright. 2012. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. The Annals of Statistics , Vol. 40 (2012), 1637--1664.Google ScholarGoogle ScholarCross RefCross Ref
  40. Swati Rallapalli, Lili Qiu, Yin Zhang, and Yi-Chao Chen. 2010. Exploiting temporal stability and low-rank structure for localization in mobile networks. In Proceedings of the sixteenth annual international conference on Mobile computing and networking. ACM, 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Benjamin Recht. 2011. A simpler approach to matrix completion. Journal of Machine Learning Research , Vol. 12, Dec (2011), 3413--3430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jorma Rissanen. 1984. Universal coding, information, prediction, and estimation. Information Theory, IEEE Transactions on , Vol. 30, 4 (1984), 629--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. David S. Stoffer Robert H. Shumway. 2015. Time Series Analysis and It's Applications 3rd ed.). Blue Printing.Google ScholarGoogle Scholar
  44. Jürgen Schmidhuber. 1992. Learning complex, extended sequences using the principle of history compression. Neural Computation , Vol. 4, 2 (1992), 234--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. David H Schoellhamer. 2001. Singular spectrum analysis for time series with missing data. Geophysical Research Letters , Vol. 28, 16 (2001), 3187--3190.Google ScholarGoogle ScholarCross RefCross Ref
  46. Y Shen, F Peng, and B Li. 2015. Improved singular spectrum analysis for time series with missing data. Nonlinear Processes in Geophysics , Vol. 22, 4 (2015), 371--376.Google ScholarGoogle ScholarCross RefCross Ref
  47. Paul C Shields. 1998. The interactions between ergodic theory and information theory. In IEEE Transactions on Information Theory. Citeseer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Robert H Shumway and David S Stoffer. 1982. An approach to time series smoothing and forecasting using the EM algorithm. Journal of time series analysis , Vol. 3, 4 (1982), 253--264.Google ScholarGoogle ScholarCross RefCross Ref
  49. Grigorios Tsagkatakis, Baltasar Beferull-Lozano, and Panagiotis Tsakalides. 2016. Singular spectrum-based matrix completion for time series recovery and prediction. EURASIP Journal on Advances in Signal Processing , Vol. 2016, 1 (2016), 66.Google ScholarGoogle ScholarCross RefCross Ref
  50. Roman Vershynin. 2010. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027 (2010).Google ScholarGoogle Scholar
  51. Christopher Xie, Alex Talk, and Emily Fox. 2016. A Unified Framework for Missing Data and Cold Start Prediction for Time Series Data. In Advances in neural information processing systems Time Series Workshop.Google ScholarGoogle Scholar
  52. Fanny Yang, Sivaraman Balakrishnan, and Martin J Wainwright. 2017. Statistical and computational guarantees for the Baum-Welch algorithm. The Journal of Machine Learning Research , Vol. 18, 1 (2017), 4528--4580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Hsiang-Fu Yu, Nikhil Rao, and Inderjit S Dhillon. 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. In Advances in neural information processing systems. 847--855. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Yuan Zhang, Elizaveta Levina, and Ji Zhu. 2015. Estimating network edge probabilities by neighborhood smoothing. arXiv preprint arXiv:1509.08588 (2015).Google ScholarGoogle Scholar

Index Terms

  1. Model Agnostic Time Series Analysis via Matrix Estimation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
        Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 2, Issue 3
        December 2018
        248 pages
        EISSN:2476-1249
        DOI:10.1145/3301416
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 December 2018
        Published in pomacs Volume 2, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!