research-article

GrammarViz 3.0: Interactive Discovery of Variable-Length Time Series Patterns

Abstract

The problems of recurrent and anomalous pattern discovery in time series, e.g., motifs and discords, respectively, have received a lot of attention from researchers in the past decade. However, since the pattern search space is usually intractable, most existing detection algorithms require that the patterns have discriminative characteristics and have its length known in advance and provided as input, which is an unreasonable requirement for many real-world problems. In addition, patterns of similar structure, but of different lengths may co-exist in a time series. Addressing these issues, we have developed algorithms for variable-length time series pattern discovery that are based on symbolic discretization and grammar inference—two techniques whose combination enables the structured reduction of the search space and discovery of the candidate patterns in linear time. In this work, we present GrammarViz 3.0—a software package that provides implementations of proposed algorithms and graphical user interface for interactive variable-length time series pattern discovery. The current version of the software provides an alternative grammar inference algorithm that improves the time series motif discovery workflow, and introduces an experimental procedure for automated discretization parameter selection that builds upon the minimum cardinality maximum cover principle and aids the time series recurrent and anomalous pattern discovery.

References

  1. Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. SIGMOD Record 22, 2 (June 1993), 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering (ICDE’95). IEEE Computer Society, Washington, DC, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Philippe Beaudoin, Stelian Coros, Michiel van de Panne, and Pierre Poulin. 2008. Motion-motif graphs. In Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA’08). Eurographics Association, Aire-la-Ville, Switzerland, 117--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jeremy Buhler and Martin Tompa. 2002. Finding motifs using random projections.Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 9, 2 (2002), 225--242.Google ScholarGoogle ScholarCross RefCross Ref
  5. Nuno Castro and Paulo J. Azevedo. 2010. Multiresolution motif discovery in time series. In Proceedings of the SDM. SIAM, 665--676.Google ScholarGoogle Scholar
  6. Varun Chandola. 2009. Detecting anomalies in a time series database Varun Chandola, Deepthi Cheboli, and Vipin Kumar. (2009).Google ScholarGoogle Scholar
  7. Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. 2005. The smallest grammar problem. IEEE Transactions on Information Theory 51, 7 (2005), 2554--2576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bill Chiu, Eamonn Keogh, and Stefano Lonardi. 2003. Probabilistic discovery of time series motifs. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, New York, NY, 493--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (1st ed.). Cambridge University Press.Google ScholarGoogle Scholar
  10. Aristides Gionis and Heikki Mannila. 2003. Finding recurrent sources in sequences. In Proceedings of the 7th Annual International Conference on Research in Computational Molecular Biology (RECOMB’03). ACM, New York, NY, 123--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ary L. Goldberger, Luis A. N. Amaral, Leon Glass, Jeffrey M. Hausdorff, Plamen Ch Ivanov, Roger G. Mark, Joseph E. Mietus, George B. Moody, Chung-Kang Peng, and H. Eugene Stanley. 2000. Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation 101, 23 (2000), e215--e220.Google ScholarGoogle ScholarCross RefCross Ref
  12. Dina Q. Goldin and Paris C. Kanellakis. 1995. On similarity queries for time-series data: Constraint specification and implementation. In Proceedings of the 1st International Conference on Principles and Practice of Constraint Programming (CP’95). Springer-Verlag, London, UK, 137--153. http://dl.acm.org/citation.cfm?id=647484.726176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Josif Grabocka, Nicolas Schilling, and Lars Schmidt-Thieme. 2016. Latent time-series motifs. ACM Transactions on Knowledge Discovery from Data 11, 1 (2016), 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Peter D. Grünwald. 2007. The Minimum Description Length Principle. MIT press.Google ScholarGoogle Scholar
  15. M. Gupta, Jing Gao, C. C. Aggarwal, and Jiawei Han. 2014. Outlier detection for temporal data: A survey. Knowledge and Data Engineering, IEEE Transactions on 26, 9 (Sep. 2014), 2250--2267.Google ScholarGoogle Scholar
  16. Eran Halperin and Richard M. Karp. 2005. The minimum-entropy set cover problem. Theoretical Computer Science 348, 23 (2005), 240--250. Automata, Languages and Programming: Algorithms and Complexity (ICALP-A’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dan He. 2006. Using suffix tree to discover complex repetitive patterns in DNA sequences.Conference Proceedings : … Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference 1 (2006), 3474--3477.Google ScholarGoogle Scholar
  18. Donald R. Jones, Cary D. Perttunen, and Bruce E. Stuckman. 1993. Lipschitzian optimization without the Lipschitz constant. Journal of Optimization Theory and Applications 79, 1 (1993), 157--181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Andreea Julea, Nicolas Méger, Philippe Bolon, Christophe Rigotti, Marie-Pierre Doin, Cécile Lasserre, Emmanuel Trouvé, and Vasile N. Lăzărescu. 2011. Unsupervised spatiotemporal mining of satellite image time series using grouped frequent sequential patterns. IEEE Transactions on Geoscience and Remote Sensing 49, 4 (April 2011), 1417--1430.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yoshinobu Kawahara and Masashi Sugiyama. 2009. Change-point detection in time-series data by direct density-ratio estimation. In Proceedings of the SDM. Vol. 9. SIAM, 389--400.Google ScholarGoogle Scholar
  21. E. Keogh, J. Lin, and A. Fu. 2005. HOT SAX: Efficiently finding the most unusual time series subsequence. In Proceedings of the 5th IEEE International Conference on Data Mining.. pp. 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Eamonn Keogh, Stefano Lonardi, and Chotirat Ann Ratanamahatana. 2004. Towards parameter-free data mining. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04). ACM, New York, NY, 206--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hoang Thanh Lam, Fabian Mörchen, Dmitriy Fradkin, and Toon Calders. 2014. Mining compressing sequential patterns. Statistical Analysis and Data Mining 7, 1 (2014), 34--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. J. Larsson and A. Moffat. 1999. Offline dictionary-based compression. In Proceedings Data Compression Conference (DCC’99). 296--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ming Li and Paul Vitányi. 2013. An Introduction to Kolmogorov Complexity and Its Applications. Springer Science 8 Business Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yuan Li, Jessica Lin, and Tim Oates. 2012. Visualizing variable-length time series motifs. In Proceedings of the SDM. SIAM, 895--906.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD’03). ACM, New York, NY, 2--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Pranav Patel. 2002. Finding motifs in time series. (2002), 53--68.Google ScholarGoogle Scholar
  29. Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. 2007. Experiencing SAX: A novel symbolic representation of time series. Data Mining and Knowledge Discovery 15, 2 (1 Oct. 2007), 107--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Per Martin-Löf. 1966. The definition of random sequences. Information and Control 9, 6 (1966), 602--619.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jingjing Meng, Junsong Yuan, Mat Hans, and Ying Wu. 2008. Mining motifs from human motion. In Proceedings of the EUROGRAPHICS. Vol. 8.Google ScholarGoogle Scholar
  32. David Minnen, Charles L. Isbell, Irfan Essa, and Thad Starner. 2007. Discovering multivariate motifs using subsequence density estimation. In Proceedings of the AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David Minnen, Thad Starner, Irfan Essa, and Charles Isbell. 2006. Activity discovery: Sparse motifs from multivariate time series. In The Learning Workshop (SNOWBIRD).Google ScholarGoogle Scholar
  34. Yasser Mohammad and Toyoaki Nishida. 2014. Exact discovery of length-range motifs. In Proceedings of the Asian Conference on Intelligent Information and Database Systems. Springer, 23--32.Google ScholarGoogle ScholarCross RefCross Ref
  35. Abdullah Mueen and Nikan Chavoshi. 2015. Enumeration of time series motifs of all lengths. Knowledge and Information Systems 45, 1 (2015), 105--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Abdullah Mueen, Eamonn Keogh, Qiang Zhu, and Sydney Cash. 2009. Exact discovery of time series motifs. In Proceedings of the SDM.Google ScholarGoogle ScholarCross RefCross Ref
  37. Craig G. Nevill Manning and Ian H. Witten. 1997. Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research 7, 1 (Sep. 1997), 67--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Craig G. Nevill-Manning and Ian H. Witten. 1997. Linear-time, incremental hierarchy inference for compression. In Proceedings of the Data Compression Conference (DCC’97). IEEE, 3--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. P. Nunthanid, V. Niennattrakul, and C. A. Ratanamahatana. 2011. Discovery of variable length time series motif. In Proceedings of the 8th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON’11). 472--475.Google ScholarGoogle Scholar
  40. Tim Oates. 2002. PERUSE: An unsupervised algorithm for finding recurring patterns in time series. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02). 330--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Pranav Patel, Eamonn Keogh, Jessica Lin, and Stefano Lonardi. 2002. Mining motifs in massive time series databases. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02). 370--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Piciarelli, C. Micheloni, and G. L. Foresti. 2008. Trajectory-based anomalous event detection. IEEE Transactions on Circuits and Systems for Video Technology 18, 11 (Nov. 2008), 1544--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Pavel Senin, Jessica Lin, Xing Wang, Tim Oates, Sunil Gandhi, Arnold P. Boedihardjo, Crystal Chen, and Susan Frankenstein. 2015. Time series anomaly discovery with grammar-based compression. In Proceedings of the International Conference on Extending Database Technology.Google ScholarGoogle Scholar
  44. Pavel Senin, Jessica Lin, Xing Wang, Tim Oates, Sunil Gandhi, Arnold P. Boedihardjo, Crystal Chen, Susan Frankenstein, and Manfred Lerner. 2014. GrammarViz 2.0: A tool for grammar-based pattern discovery in time series. In Proceedings of the Machine Learning and Knowledge Discovery in Databases. Springer, 468--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. Staden. 1989. Methods for discovering novel motifs in nucleic acid sequences. Computer Applications in the Biosciences : CABIOS 5, 4 (Oct. 1989), 293--298.Google ScholarGoogle Scholar
  46. Yoshiki Tanaka, Kazuhisa Iwamoto, and Kuniaki Uehara. 2005. Discovery of time-series motif from multi-dimensional data based on MDL principle. Machine Learning 58, 2--3 (Feb. 2005), 269--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yoshiki Tanaka and Kuniaki Uehara. 2004. Motif discovery algorithm from motion data. In Proceedings of the 18th Annual Conference of the Japanese Society for Artificial Intelligence.Google ScholarGoogle Scholar
  48. Heng Tang and Stephen S. Liao. 2008. Discovering original motifs with different lengths from time series. Knowledge-Based Systems 21, 7 (Oct. 2008), 666--671. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Nikolaj Tatti and Jilles Vreeken. 2012. The long and the short of it: Summarising event sequences with serial episodes. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 462--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Matthijs Van Leeuwen and Jilles Vreeken. 2014. Mining and using sets of patterns through compression. In Proceedings of the Frequent Pattern Mining. Springer, 165--198.Google ScholarGoogle ScholarCross RefCross Ref
  51. Jarke J. Van Wijk and Edward R. Van Selow. 1999. Cluster and calendar based visualization of time series data. In Proceedings of 1999 IEEE Symposium on Information Visualization (Info Vis’99). IEEE, 4--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. S. Vitevitch, P. A. Luce, J. Charles-Luce, and D. Kemmerer. 1997. Phonotactics and syllable stress: Implications for the processing of spoken nonsense words. Language and Speech 40 (Pt. 1) (1997), 47--62.Google ScholarGoogle Scholar
  53. Li Wei, Eamonn Keogh, and Xiaopeng Xi. 2006. SAXually explicit images: Finding unusual shapes. In Proceedings 6th International Conference on Data Mining (ICDM’06). IEEE, 711--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Dragomir Yankov, Eamonn Keogh, and Umaa Rebbapragada. 2008. Disk aware discord discovery: Finding unusual time series in terabyte sized datasets. Knowledge and Information Systems 17, 2 (2008), 241--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Neal E. Young. 2008. Greedy set-cover algorithms. In Encyclopedia of Algorithms, Ming-Yang Kao (Ed.). Springer, 379--381.Google ScholarGoogle Scholar

Index Terms

  1. GrammarViz 3.0

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!