ABSTRACT
Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea of these approaches is to estimate a language model for each document, and then rank documents by the likelihood of the query according to the estimated language model. A core problem in language model estimation is smoothing, which adjusts the maximum likelihood estimator so as to correct the inaccuracy due to data sparseness. In this paper, we study the problem of language model smoothing and its influence on retrieval performance. We examine the sensitivity of retrieval performance to the smoothing parameters and compare several popular smoothing methods on different test collections.
- 1.A.Berger and J.La .erty (1999)."Information retrieval as statistical translation,"In Proceedings of the 1999 ACM SI- GIR Conference on Research and Development in Information Retrieval pp.222 -229. Google Scholar
Digital Library
- 2.S.F.Chen and J.Goo man (1998)."An empirical study of smoothing techniques for language modeling,"Tech.Rep. TR-10-98,Harvar University.Google Scholar
- 3.N.Fuhr (1992)."Probabilistic mo els in information retrieval ",The Computer Journal Vol.35,No.3,pp.243 -255. Google Scholar
Digital Library
- 4.I.J.Goo (1953)."The Population Frequencies of Species and the Estimation of Population Parameters,"Biometrika Volume 40,parts 3,4,pp.237 -264.Google Scholar
- 5.D.Hiemstra and W.Kraaij (1998)."Twenty-one at TREC- 7:A -hoc and cross-language track,"in Proc. of Seventh Text REtrieval Conference (TREC-7),Gaithersburg,MD.Google Scholar
- 6.F.Jelinek and R.Mercer (1980)."Interpolated estimation of Markov source parameters from sparse ata ".In Pattern Recognition in Practice E.S.Gelsemaan L.N.Kanal(editors),pages 381 -402.North Holland,Amsterdam.Google Scholar
- 7.S.M.Katz (1987)."Estimation of probabilities from sparse data for the language model component of a speech recognizer,"IEEE Transactions on Acoustics, Speech and Signal Processing volume ASSP-35,pages 400 -401,March 1987.Google Scholar
- 8.R.Kneser and H.Ney (1995)."Improved smoothing for mgram language modeling,"in Proceedings of the International Conference on Acoustics, Speech and Signal Processing Detroit,MI.Google Scholar
- 9.MacKay,D.and Peto,L.(1995)."A hierarchical Dirichlet language model."Natural Language Engineering 1(3),pp. 289 -307.Google Scholar
- 10.D.H.Miller,T.Leek,an R.Schwartz (1999)."A hid en Markov model information retrieval system,"In Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval pp.214 -221. Google Scholar
Digital Library
- 11.H.Ney,U.Essen,and R.Kneser (1994)."On structuring probabilistic epen encies in stochastic language mo eling," Computer Speech and Language 8:1-38.Google Scholar
- 12.J.Ponte (1998).A language modeling approach to information retrieval Ph.D.thesis,University of Massachusetts at Amherst. Google Scholar
Digital Library
- 13.J.Ponte and W.B.Croft (1998)."A language mo eling approach to information retrieval,"Proceedings of the ACM SIGIR pp.275 -281. Google Scholar
Digital Library
- 14.C.J.van Rijsbergen (1986)."A Non-classical Logic for Information Retrieval,"The Computer Journal 29(6).Google Scholar
- 15.S.E.Robertson,C.J.van-Rijsbergen,and M.F.Porter (1981)."Probabilistic mo els of in exing an searching ",in Oddy R.N.et al.(E s.)Information Retrieval Research, Butterworths,London,1981,pp.35 -56. Google Scholar
Digital Library
- 16.S.E.Robertson,S.Walker,S.Jones,M.M.Hancock- Beaulieu,and M.Gatfor (1995)."Okapi at TREC-3,"The Third Text REtrieval Conference (TREC-3),inD.K.Harman (e ),NIST Special Publication.Google Scholar
- 17.G.Salton and C.Buckley (1988)."Term-weighting approaches in automatic text retrieval,"Information Processing and Management 24 pp.513 -523. Google Scholar
Digital Library
- 18.G.Salton and C.Buckley (1990),"Improving retrieval performance by relevance feedback ",Journal of the American Society for Information Science,Vol.44,No.4,288 -297.Google Scholar
Cross Ref
- 19.A.Singhal,C.Buckley,an M.Mitra (1996)."Pivote ocument length normalization,"in Proceedings of the 1996 ACM SIGIR Conference on Research and Development in Information Retrieval pp.21 -29. Google Scholar
Digital Library
- 20.F.Song and B.Croft (1999)."A general language model for information retrieval,"in Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval pp.279 -280. Google Scholar
Digital Library
- 21.K.Sparck Jones (1997).Readings in Information Retrieval P.Willett,e .,Morgan Kaufmann Publishers. Google Scholar
Digital Library
- 22.S.K.M.Wong and Y.Y.Yao (1995),"On modeling information retrieval with probabilistic inference,"ACM Transactions on Information Systems 13(1),pp.69 -99. Google Scholar
Digital Library
Index Terms
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Recommendations
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval
SIGIR Test-of-Time Awardees 1978-2001Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech ...
A study of smoothing methods for language models applied to information retrieval
Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech ...
A comparative study of probabilistic and language models for information retrieval
ADC '08: Proceedings of the nineteenth conference on Australasian database - Volume 75Language models for information retrieval have received much attention in recent years, with many claims being made about their performance. However, previous studies evaluating the language modelling approach for information retrieval used different ...






Comments