Abstract
Documents with timestamps, such as email and news, can be placed along a timeline. The timeline for a set of documents returned in response to a query gives an indication of how documents relevant to that query are distributed in time. Examining the timeline of a query result set allows us to characterize both how temporally dependent the topic is, as well as how relevant the results are likely to be. We outline characteristic patterns in query result set timelines, and show experimentally that we can automatically classify documents into these classes. We also show that properties of the query result set timeline can help predict the mean average precision of a query. These results show that meta-features associated with a query can be combined with text retrieval techniques to improve our understanding and treatment of text search on documents with timestamps.
References
- Allan, J., Callan, J., Collins-Thompson, K., Croft, B., Feng, F., Fisher, D., Lafferty, J., Larkey, L., Truong, T. N., Ogilvie, P., Si, L., Strohman, T., Turtle, H., and Zhai, C. 2003. The lemur toolkit for language modeling and information retrieval. http://www-2.cs.cmu.edu/~lemur/.Google Scholar
- Anick, P. 2003. Using terminological feedback for web search refinement: A log-based study. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM, New York, 88--95. Google Scholar
Digital Library
- Chieu, H. L. and Lee, Y. K. 2004. Query based event extraction along a timeline. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004) (Sheffield, UK) ACM, New York, 425--432. Google Scholar
Digital Library
- Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Kluwer Academic Publishers. Google Scholar
Digital Library
- Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2002. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002). ACM, New York, 299--306. Google Scholar
Digital Library
- Diaz, F. and Jones, R. 2004. Using temporal profiles of queries for precision prediction. In Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval. ACM, New York, 18--24. Google Scholar
Digital Library
- He, B. and Ounis, I. 2004. Inferring query performance using pre-retrieval predictors. In Proceedings of the 11th Symposium on String Processing and Information Retrieval (SPIRE 2004) (Padova, Italy). Lecture Notes in Computer Science, Springer-Verlag, New York.Google Scholar
- Kleinberg, J. 2002. Bursty and hierarchical structure in streams. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002). ACM, New York, 91--101. Google Scholar
Digital Library
- Krovetz, R. 1993. Viewing morphology as an inference process. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1993). ACM, New York, 191--203. Google Scholar
Digital Library
- Lavrenko, V. and Croft, W. B. 2001. Relevance-based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001). ACM, New York, 120--127. Google Scholar
Digital Library
- Li, X. and Croft, W. B. 2003. Time-based language models. In Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management (CIKM 2003). ACM, New York, 469--475. Google Scholar
Digital Library
- Mani, I. and Wilson, G. 2000. Robust temporal processing of news. In ACL '00: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Morristown, NJ, 69--76. Google Scholar
Digital Library
- Manning, C. D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan-Kaufmann, Franciso, CA. Google Scholar
Digital Library
- Salton, G. 1971. The SMART Retrieval System&##8212;Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ. Google Scholar
Digital Library
- Swan, R. and Jensen, D. 2000. TimeMines: Constructing timelines with statistical models of word usage. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000). ACM, New York, 73--80.Google Scholar
- Tomokiyo, T. and Hurst, M. 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, D. M. Francis Bond, A. Korhonen, and A. Villavicencio, Eds. 33--40. Google Scholar
Digital Library
- Voorhees, E. M. and Harman, D. K., Eds. 2001. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Witten, I. H. and Frank, E. 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan-Kaufmann, San Francisco, CA. http://www.cs.waikato.ac.nz/ml/weka/. Google Scholar
Digital Library
Index Terms
Temporal profiles of queries





Comments