skip to main content
10.1145/2783258.2783411acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open Access

Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams

Published:10 August 2015Publication History

ABSTRACT

Clusters in document streams, such as online news articles, can be induced by their textual contents, as well as by the temporal dynamics of their arriving patterns. Can we leverage both sources of information to obtain a better clustering of the documents, and distill information that is not possible to extract using contents only? In this paper, we propose a novel random process, referred to as the Dirichlet-Hawkes process, to take into account both information in a unified framework. A distinctive feature of the proposed model is that the preferential attachment of items to clusters according to cluster sizes, present in Dirichlet processes, is now driven according to the intensities of cluster-wise self-exciting temporal point processes, the Hawkes processes. This new model establishes a previously unexplored connection between Bayesian Nonparametrics and temporal Point Processes, which makes the number of clusters grow to accommodate the increasing complexity of online streaming contents, while at the same time adapts to the ever changing dynamics of the respective continuous arrival time. We conducted large-scale experiments on both synthetic and real world news articles, and show that Dirichlet-Hawkes processes can recover both meaningful topics and temporal dynamics, which leads to better predictive performance in terms of content perplexity and arrival time of future documents.

Skip Supplemental Material Section

Supplemental Material

p219.mp4

References

  1. O. Aalen, O. Borgan, and H. Gjessing. Survival and event history analysis: a process point of view. Springer, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Ahmed, J. Eisenstein, Q. Ho, E. P. Xing, A. J. Smola, and C. H. Teo. The topic-cluster model. In Artificial Intelligence and Statistics AISTATS, 2011.Google ScholarGoogle Scholar
  3. A. Ahmed, Q. Ho, J. Eisenstein, E. Xing, A. Smola, and C. Teo. Unified analysis of streaming news. In Proceedings of WWW, Hyderabad, India, 2011. IW3C2, Sheridan Printing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Ahmed and E. Xing. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. In SDM, pages 219--230. SIAM, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics, 2:1152--1174, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. Blei and P. Frazier. Distance dependent chinese restaurant processes. In ICML, pages 87--94, 2010.Google ScholarGoogle Scholar
  7. D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML, pages 113--120, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Daley and D. Vere-Jones. An introduction to the theory of point processes: volume II: general theory and structure, volume 2. Springer, 2007.Google ScholarGoogle Scholar
  9. Q. Diao and J. Jiang. Recurrent chinese restaurant process with a duration-based discount for event identification from twitter. In SDM, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Doucet, J. F. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In C. Boutilier and M. Goldszmidt, editors, UAI, pages 176--183, SF, CA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Doucet, N. de Freitas, and N. Gordon. Sequential Monte Carlo Methods in Practice. Springer-Verlag, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  12. N. Du, L. Song, A. Smola, and M. Yuan. Learning networks of heterogeneous influence. In NIPS, pages 2789--2797, 2012.Google ScholarGoogle Scholar
  13. N. Du, L. Song, H. Woo, and H. Zha. Uncover Topic-Sensitive Information Diffusion Networks. In Artificial Intelligence and Statistics (AISTATS), 2013.Google ScholarGoogle Scholar
  14. M. Farajtabar, N. Du, M. Gomez-Rodriguez, I. Valera, H. Zha, and L. Song. Shaping Social Activity by Incentivizing Users. In NIPS, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Griffiths and Z. Ghahramani. The indian buffet process: An introduction and review. Journal of Machine Learning Research, 12:1185--1224, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. G. Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1):83--90, 1971.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. L. Hjort, C. Holmes, P. Muller, and S. G. Walker. Bayesian Nonparametrics. Cambridge University Press, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Kingman. On doubly stochastic poisson processes. Mathematical Proceedings of the Cambridge Philosophical Society, pages 923--930, 1964.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. F. C. Kingman. Poisson processes, volume 3. Oxford university press, 1992.Google ScholarGoogle Scholar
  21. L. Li, H. Deng, A. Dong, Y. Chang, and H. Zha. Identifying and labeling search tasks via query-based hawkes processes. In KDD, pages 731--740, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Suen, S. Huang, C. Eksombatchai, R. Sosic, and J. Leskovec. Nifty: A system for large scale information flow tracking and clustering. In WWW, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. W. Teh. A hierarchical bayesian language model based on pitman-yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 985--992, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Wang and A. McCallum. Topics over time: A non-markov continuous-time model of topical trends. In KDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2015
        2378 pages
        ISBN:9781450336642
        DOI:10.1145/2783258

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 August 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader