ABSTRACT
An approach to automatic translation is outlined that utilizes techniques of statistical information extraction from large data bases. The method is based on the availability of pairs of large corresponding texts that are translations of each other. In our case, the texts are in English and French.Fundamental to the technique is a complex glossary of correspondence of fixed locutions. The steps of the proposed translation process are: (1) Partition the source text into a set of fixed locutions. (2) Use the glossary plus contextual information to select the corresponding set of fixed locutions into a sequence forming the target sentence. (3) Arrange the words of the target fixed locutions into a sequence forming the target sentence.We have developed statistical techniques facilitating both the automatic creation of the glossary, and the performance of the three translation steps, all on the basis of an alignment of corresponding sentences in the two texts.While we are not yet able to provide examples of French / English translation, we present some encouraging intermediate results concerning glossary creation and the arrangement of target word sequences.
- L. R. Bahl, F. Jelinck, and R. L. Mercer: A maximum likelihood approach to continous speech recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(2):179--190, March 1983.Google Scholar
Digital Library
- J. K. Baker: Stochastic modeling for automatic speech understanding. In R. A. Reddy, editor, Speech Recognition, pages 521--541, Academic Press, New York, 1979.Google Scholar
- J. D. Ferguson: Hidden Markov analysis: An introduction. In J. D. Ferguson, Ed., Hidden Markov Models for Speech. Princeton, New Jersey, IDA-CRD, Oct. 1980, pp. 8--15Google Scholar
- J. McH. Sinclair: "Lexicographic Evidence" in, Dictionaries, Lexicography and Language Learning (ELT Documents: 120), editor R. Ilson, New York: Pergamon Press, pp. 81--94, 1985.Google Scholar
- R. G. Garside, G. N. Leech and G. R. Sampson, The Computational Analysis of English: a Corpus-Based Approach, Longman 1987.Google Scholar
- G. R. Sampson, "A Stochastic Approach to Parsing" in. Proceedings of the 11th International Conference on Computational Linguistics (COLING '86) Bonn 151--155, 1986. Google Scholar
Digital Library
- W. Weaver: Translation (1949). Reproduced in: Locke. W. N. & Booth, A. D. eds.: Machine translation of languages. Cambridge, MA.: MIT Press, 1955.Google Scholar
- Hansards: Official Proceedings of the House of Commons of Canada, 1974-78, Canadian Government Printing Bureau, Hull Quebec Canada.Google Scholar
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone: Classification and Regression Trees, Wadsworth and Brooks, Monterey, CA, 1984.Google Scholar
- R. G. Gallager: Information Theory and Reliable Communications, John Wiley and Sons, Inc., New York. 1968. Google Scholar
Digital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39(B):1--38, 1977.Google Scholar
- A. J. Viterbi: Error bounds for convolutional codes and an asymtotically optimum decoding algorithm, IEEE Transactions on Information Theory, IT-13:260--267, 1967.Google Scholar
Digital Library
- L. E. Baum: An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities, 3:1--8, 1972.Google Scholar
- F. Jelinek. A fast sequential decoding algorithm using a stack, IBM T. J. Watson Research Development, vol. 13, pp. 675--685, Nov. 1969.Google Scholar
Digital Library
Index Terms
(auto-classified)A statistical approach to language translation
Recommendations
Pivot language approach for phrase-based statistical machine translation
This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language. To translate between languages L s and L t with limited bilingual resources, we bring in a third language, L p , ...
Statistical machine translation into a morphologically complex language
In this paper, we present the results of our investigation into phrase-based statistical machine translation from English into Turkish - an agglutinative language with very productive inflectional and derivational word-formation processes. We ...
Statistical machine translation of subtitles for highly inflected language pair
This paper addresses the problem of statistical machine translation between highly inflected languages. Even when dealing with closely-related language pairs, statistical machine translation encounters problems if the parallel corpus is not big enough. ...





Comments