10.3115/991635.991651dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free Access

A statistical approach to language translation

Authors Info & Claims
Published:22 August 1988Publication History

ABSTRACT

An approach to automatic translation is outlined that utilizes techniques of statistical information extraction from large data bases. The method is based on the availability of pairs of large corresponding texts that are translations of each other. In our case, the texts are in English and French.Fundamental to the technique is a complex glossary of correspondence of fixed locutions. The steps of the proposed translation process are: (1) Partition the source text into a set of fixed locutions. (2) Use the glossary plus contextual information to select the corresponding set of fixed locutions into a sequence forming the target sentence. (3) Arrange the words of the target fixed locutions into a sequence forming the target sentence.We have developed statistical techniques facilitating both the automatic creation of the glossary, and the performance of the three translation steps, all on the basis of an alignment of corresponding sentences in the two texts.While we are not yet able to provide examples of French / English translation, we present some encouraging intermediate results concerning glossary creation and the arrangement of target word sequences.

References

  1. L. R. Bahl, F. Jelinck, and R. L. Mercer: A maximum likelihood approach to continous speech recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(2):179--190, March 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. K. Baker: Stochastic modeling for automatic speech understanding. In R. A. Reddy, editor, Speech Recognition, pages 521--541, Academic Press, New York, 1979.Google ScholarGoogle Scholar
  3. J. D. Ferguson: Hidden Markov analysis: An introduction. In J. D. Ferguson, Ed., Hidden Markov Models for Speech. Princeton, New Jersey, IDA-CRD, Oct. 1980, pp. 8--15Google ScholarGoogle Scholar
  4. J. McH. Sinclair: "Lexicographic Evidence" in, Dictionaries, Lexicography and Language Learning (ELT Documents: 120), editor R. Ilson, New York: Pergamon Press, pp. 81--94, 1985.Google ScholarGoogle Scholar
  5. R. G. Garside, G. N. Leech and G. R. Sampson, The Computational Analysis of English: a Corpus-Based Approach, Longman 1987.Google ScholarGoogle Scholar
  6. G. R. Sampson, "A Stochastic Approach to Parsing" in. Proceedings of the 11th International Conference on Computational Linguistics (COLING '86) Bonn 151--155, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Weaver: Translation (1949). Reproduced in: Locke. W. N. & Booth, A. D. eds.: Machine translation of languages. Cambridge, MA.: MIT Press, 1955.Google ScholarGoogle Scholar
  8. Hansards: Official Proceedings of the House of Commons of Canada, 1974-78, Canadian Government Printing Bureau, Hull Quebec Canada.Google ScholarGoogle Scholar
  9. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone: Classification and Regression Trees, Wadsworth and Brooks, Monterey, CA, 1984.Google ScholarGoogle Scholar
  10. R. G. Gallager: Information Theory and Reliable Communications, John Wiley and Sons, Inc., New York. 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. P. Dempster, N. M. Laird, and D. B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39(B):1--38, 1977.Google ScholarGoogle Scholar
  12. A. J. Viterbi: Error bounds for convolutional codes and an asymtotically optimum decoding algorithm, IEEE Transactions on Information Theory, IT-13:260--267, 1967.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. E. Baum: An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities, 3:1--8, 1972.Google ScholarGoogle Scholar
  14. F. Jelinek. A fast sequential decoding algorithm using a stack, IBM T. J. Watson Research Development, vol. 13, pp. 675--685, Nov. 1969.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

(auto-classified)
  1. A statistical approach to language translation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        COLING '88: Proceedings of the 12th conference on Computational linguistics - Volume 1
        August 1988
        426 pages
        ISBN:963 8431 56 3

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 22 August 1988

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,537of1,537submissions,100%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!