skip to main content
10.1145/383952.384019acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

A study of smoothing methods for language models applied to Ad Hoc information retrieval

Published:01 September 2001Publication History

ABSTRACT

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea of these approaches is to estimate a language model for each document, and then rank documents by the likelihood of the query according to the estimated language model. A core problem in language model estimation is smoothing, which adjusts the maximum likelihood estimator so as to correct the inaccuracy due to data sparseness. In this paper, we study the problem of language model smoothing and its influence on retrieval performance. We examine the sensitivity of retrieval performance to the smoothing parameters and compare several popular smoothing methods on different test collections.

References

  1. 1.A.Berger and J.La .erty (1999)."Information retrieval as statistical translation,"In Proceedings of the 1999 ACM SI- GIR Conference on Research and Development in Information Retrieval pp.222 -229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.S.F.Chen and J.Goo man (1998)."An empirical study of smoothing techniques for language modeling,"Tech.Rep. TR-10-98,Harvar University.Google ScholarGoogle Scholar
  3. 3.N.Fuhr (1992)."Probabilistic mo els in information retrieval ",The Computer Journal Vol.35,No.3,pp.243 -255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.I.J.Goo (1953)."The Population Frequencies of Species and the Estimation of Population Parameters,"Biometrika Volume 40,parts 3,4,pp.237 -264.Google ScholarGoogle Scholar
  5. 5.D.Hiemstra and W.Kraaij (1998)."Twenty-one at TREC- 7:A -hoc and cross-language track,"in Proc. of Seventh Text REtrieval Conference (TREC-7),Gaithersburg,MD.Google ScholarGoogle Scholar
  6. 6.F.Jelinek and R.Mercer (1980)."Interpolated estimation of Markov source parameters from sparse ata ".In Pattern Recognition in Practice E.S.Gelsemaan L.N.Kanal(editors),pages 381 -402.North Holland,Amsterdam.Google ScholarGoogle Scholar
  7. 7.S.M.Katz (1987)."Estimation of probabilities from sparse data for the language model component of a speech recognizer,"IEEE Transactions on Acoustics, Speech and Signal Processing volume ASSP-35,pages 400 -401,March 1987.Google ScholarGoogle Scholar
  8. 8.R.Kneser and H.Ney (1995)."Improved smoothing for mgram language modeling,"in Proceedings of the International Conference on Acoustics, Speech and Signal Processing Detroit,MI.Google ScholarGoogle Scholar
  9. 9.MacKay,D.and Peto,L.(1995)."A hierarchical Dirichlet language model."Natural Language Engineering 1(3),pp. 289 -307.Google ScholarGoogle Scholar
  10. 10.D.H.Miller,T.Leek,an R.Schwartz (1999)."A hid en Markov model information retrieval system,"In Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval pp.214 -221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.H.Ney,U.Essen,and R.Kneser (1994)."On structuring probabilistic epen encies in stochastic language mo eling," Computer Speech and Language 8:1-38.Google ScholarGoogle Scholar
  12. 12.J.Ponte (1998).A language modeling approach to information retrieval Ph.D.thesis,University of Massachusetts at Amherst. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.J.Ponte and W.B.Croft (1998)."A language mo eling approach to information retrieval,"Proceedings of the ACM SIGIR pp.275 -281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.C.J.van Rijsbergen (1986)."A Non-classical Logic for Information Retrieval,"The Computer Journal 29(6).Google ScholarGoogle Scholar
  15. 15.S.E.Robertson,C.J.van-Rijsbergen,and M.F.Porter (1981)."Probabilistic mo els of in exing an searching ",in Oddy R.N.et al.(E s.)Information Retrieval Research, Butterworths,London,1981,pp.35 -56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.S.E.Robertson,S.Walker,S.Jones,M.M.Hancock- Beaulieu,and M.Gatfor (1995)."Okapi at TREC-3,"The Third Text REtrieval Conference (TREC-3),inD.K.Harman (e ),NIST Special Publication.Google ScholarGoogle Scholar
  17. 17.G.Salton and C.Buckley (1988)."Term-weighting approaches in automatic text retrieval,"Information Processing and Management 24 pp.513 -523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.G.Salton and C.Buckley (1990),"Improving retrieval performance by relevance feedback ",Journal of the American Society for Information Science,Vol.44,No.4,288 -297.Google ScholarGoogle ScholarCross RefCross Ref
  19. 19.A.Singhal,C.Buckley,an M.Mitra (1996)."Pivote ocument length normalization,"in Proceedings of the 1996 ACM SIGIR Conference on Research and Development in Information Retrieval pp.21 -29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.F.Song and B.Croft (1999)."A general language model for information retrieval,"in Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval pp.279 -280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.K.Sparck Jones (1997).Readings in Information Retrieval P.Willett,e .,Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.S.K.M.Wong and Y.Y.Yao (1995),"On modeling information retrieval with probabilistic inference,"ACM Transactions on Information Systems 13(1),pp.69 -99. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A study of smoothing methods for language models applied to Ad Hoc information retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
          September 2001
          454 pages
          ISBN:1581133316
          DOI:10.1145/383952

          Copyright © 2001 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 September 2001

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          SIGIR '01 Paper Acceptance Rate47of201submissions,23%Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!