ABSTRACT
We present a method called TIMEMACHINE to generate a timeline of events and relations for entities in a knowledge base. For example for an actor, such a timeline should show the most important professional and personal milestones and relationships such as works, awards, collaborations, and family relationships. We develop three orthogonal timeline quality criteria that an ideal timeline should satisfy: (1) it shows events that are relevant to the entity; (2) it shows events that are temporally diverse, so they distribute along the time axis, avoiding visual crowding and allowing for easy user interaction, such as zooming in and out; and (3) it shows events that are content diverse, so they contain many different types of events (e.g., for an actor, it should show movies and marriages and awards, not just movies). We present an algorithm to generate such timelines for a given time period and screen size, based on submodular optimization and web-co-occurrence statistics with provable performance guarantees. A series of user studies using Mechanical Turk shows that all three quality criteria are crucial to produce quality timelines and that our algorithm significantly outperforms various baseline and state-of-the-art methods.
Supplemental Material
- A. Ahmed, C. H. Teo, S. Vishwanathan, and A. Smola. Fair and balanced: Learning to present news stories. In WSDM, 2012. Google Scholar
Digital Library
- J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of new topics. In SIGIR, 2001. Google Scholar
Digital Library
- T. Althoff, X. L. Dong, K. Murphy, S. Alai, V. Dang, and W. Zhang. TimeMachine: Timeline Generation for Knowledge-Base Entities. arXiv:1502.04662, 2015. Google Scholar
Digital Library
- R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, New York, 1999. Google Scholar
Digital Library
- D. Bamman and N. Smith. Unsupervised discovery of biographical structure from text. TACL, 2(10):363--376, 2014.Google Scholar
Cross Ref
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008. Google Scholar
Digital Library
- G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740--1766, 2011. Google Scholar
Digital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. Google Scholar
Digital Library
- B. Carterette, P. N. Bennett, D. M. Chickering, and S. T. Dumais. Here or there: Preference Judgments for Relevance. In Advances in Information Retrieval. Springer, 2008. Google Scholar
Digital Library
- A. Dasgupta, R. Kumar, and S. Ravi. Summarization through submodularity and dispersion. In ACL, 2013.Google Scholar
- Q. X. Do, W. Lu, and D. Roth. Joint inference for event timeline construction. In EMNLP-CoNLL, 2012. Google Scholar
Digital Library
- X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014. Google Scholar
Digital Library
- M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and A. Tomkins. Visualizing tags over time. TWEB, 1(2):7, 2007. Google Scholar
Digital Library
- U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4):634--652, 1998. Google Scholar
Digital Library
- D. Graus, M.-H. Peetz, D. Odijk, O. de Rooij, and M. de Rijke. yourHistory--Semantic linking for a personalized timeline of historic events. Workshop: LinkedUp Challenge at OKCon, 2013.Google Scholar
- T. Huet, J. Biega, and F. M. Suchanek. Mining history with le monde. In AKBC, 2013. Google Scholar
Digital Library
- H. Ji, T. Cassidy, Q. Li, and S. Tamang. Tackling representation, annotation and classification challenges for temporal knowledge base population. KAIS, 2013. Google Scholar
Digital Library
- A. Kannan, S. Baker, K. Ramnath, J. Fiss, D. Lin, L. Vanderwende, R. Ansary, A. Kapoor, Q. Ke, M. Uyttendaele, et al. Mining text snippets for images on the web. In SIGKDD, 2014. Google Scholar
Digital Library
- S. M. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Trans Sig. Process., 1987.Google Scholar
Cross Ref
- A. Krause and D. Golovin. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems (to appear). Cambridge University Press, 2014.Google Scholar
- J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In SIGKDD, 2007. Google Scholar
Digital Library
- J. Li and C. Cardie. Timeline generation: tracking individuals on twitter. In WWW, 2014. Google Scholar
Digital Library
- C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Proc. ACL Text Summarization Workshop, 2004.Google Scholar
- H. Lin and J. A. Bilmes. Learning mixtures of submodular shells with application to document summarization. In UAI, 2012.Google Scholar
Digital Library
- X. Ling and D. S. Weld. Temporal information extraction. In AAAI Conference on Artificial Intelligence, 2010.Google Scholar
- A. Mazeika, T. Tylenda, and G. Weikum. Entity timelines: Visual analytics and named entity evolution. In CIKM, 2011. Google Scholar
Digital Library
- M. Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization Techniques, pages 234--243. Springer, 1978.Google Scholar
Cross Ref
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions -- I. Mathematical Programming, 14(1):265--294, 1978.Google Scholar
Digital Library
- R. Qian. Timeline: Understanding Important Events in People's Lives.small http://blogs.bing.com/search/2014/02/21/timeline-understanding-important-events-in-peoples-lives/, February 2014. Last retrieved on Feb 18, 2015.Google Scholar
- D. Shahaf, C. Guestrin, and E. Horvitz. Metro maps of science. In SIGKDD, 2012. Google Scholar
Digital Library
- D. Shahaf, J. Yang, C. Suen, J. Jacobs, H. Wang, and J. Leskovec. Information cartography: creating zoomable, large-scale maps of information. In SIGKDD, 2013. Google Scholar
Digital Library
- W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. TKDE, 2015.Google Scholar
Cross Ref
- R. Sipos, A. Swaminathan, P. Shivaswamy, and T. Joachims. Temporal corpus summarization using submodular word coverage. In CIKM, 2012. Google Scholar
Digital Library
- K. Spärck Jones. Automatic summarising: The state of the art. Information Processing & Management, 43(6):1449--1481, 2007. Google Scholar
Digital Library
- F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW, 2007. Google Scholar
Digital Library
- F. M. Suchanek and N. Preda. Semantic Culturomics (Vision paper). In Very Large Databases (VLDB), 2014. Google Scholar
Digital Library
- R. Swan and J. Allan. Automatic generation of overview timelines. In SIGIR, 2000. Google Scholar
Digital Library
- T. Tran, A. Ceroni, M. Georgescu, K. D. Naini, and M. Fisichella. Wikipevent: Leveraging wikipedia edit history for event detection. In WISE. Springer, 2014.Google Scholar
- T. A. Tuan, S. Elbassuoni, N. Preda, and G. Weikum. CATE: Context-Aware Timeline for Entity Illustration. In WWW, 2011. Google Scholar
Digital Library
- Y. Wang, M. Zhu, L. Qu, M. Spaniol, and G. Weikum. Timely Yago: harvesting, querying, and visualizing temporal knowledge from Wikipedia. In EDBT, 2010. Google Scholar
Digital Library
- G. Weikum, N. Ntarmos, M. Spaniol, P. Triantafillou, A. A. Benczúr, S. Kirkpatrick, P. Rigaux, and M. Williamson. Longitudinal Analytics on Web Archive Data: It's About Time! In CIDR, 2011.Google Scholar
- X. W. Zhao, Y. Guo, R. Yan, Y. He, and X. Li. Timeline generation with social attention. In SIGIR, 2013. Google Scholar
Digital Library
Index Terms
TimeMachine: Timeline Generation for Knowledge-Base Entities
Recommendations
Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningDespite having various attractive qualities such as high prediction accuracy and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix Factorization has not been widely adopted because of the prohibitive cost of inference. In this ...
Timeline generation: tracking individuals on twitter
WWW '14: Proceedings of the 23rd international conference on World wide webIn this paper, we preliminarily learn the problem of reconstructing users' life history based on the their Twitter stream and proposed an unsupervised framework that create a chronological list for personal important events (PIE) of individuals. By ...
Estimating Local Intrinsic Dimensionality
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningThis paper is concerned with the estimation of a local measure of intrinsic dimensionality (ID) recently proposed by Houle. The local model can be regarded as an extension of Karger and Ruhl's expansion dimension to a statistical setting in which the ...





Comments