skip to main content
short-paper

Coarse-to-Fine Output Predictions for Efficient Decoding in Neural Machine Translation

Authors Info & Claims
Published:16 December 2022Publication History
Skip Abstract Section

Abstract

Neural Machine Translation (NMT) systems are undesirably slow as the decoder often has to compute probability distributions over large target vocabularies. In this work, we propose a coarse-to-fine approach to reduce the complexity of the decoding process, using only the information of the weight matrix in the Softmax layer. The large target vocabulary is first trimmed to a small candidate set in the coarse-grained phase, and from this candidate set the final top-k results are generated in the fine-grained phase. Tested on an RNN-based NMT system and a Transformer-based NMT system separately, our GPU-friendly method achieved a significant speed-up without harming the translation quality.

REFERENCES

  1. [1] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Bengio Yoshua and LeCun Yann (Eds.). San Diego, CA. http://arxiv.org/abs/1409.0473.Google ScholarGoogle Scholar
  2. [2] Bojar Ondrej, Chatterjee Rajen, Federmann Christian, Graham Yvette, Haddow Barry, Huck Matthias, Jimeno-Yepes Antonio, Koehn Philipp, Logacheva Varvara, Monz Christof, Negri Matteo, Névéol Aurélie, Neves Mariana L., Popel Martin, Post Matt, Rubino Raphael, Scarton Carolina, Specia Lucia, Turchi Marco, Verspoor Karin M., and Zampieri Marcos. 2016. Findings of the 2016 conference on machine translation. In Proceedings of the 1st Conference on Machine Translation (WMT’16). The Association for Computer Linguistics, Berlin, 131198. Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Devlin Jacob. 2017. Sharp models on dull hardware: Fast and accurate neural machine translation decoding on the CPU. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17), Palmer Martha, Hwa Rebecca, and Riedel Sebastian (Eds.). Association for Computational Linguistics, Copenhagen, 28202825. Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Jean Sébastien, Cho KyungHyun, Memisevic Roland, and Bengio Yoshua. 2015. On using very large target vocabulary for neural machine translation. CoRR abs/1412.2007 (2014). arXiv:1412.2007 http://arxiv.org/abs/1412.2007.Google ScholarGoogle Scholar
  5. [5] Kim Yoon and Rush Alexander M.. 2016. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16), Su Jian, Carreras Xavier, and Duh Kevin (Eds.). The Association for Computational Linguistics, Austin, TX, 13171327. Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Koehn Philipp, Hoang Hieu, Birch Alexandra, Callison-Burch Chris, Federico Marcello, Bertoldi Nicola, Cowan Brooke, Shen Wade, Moran Christine, Zens Richard, Dyer Chris, Bojar Ondrej, Constantin Alexandra, and Herbst Evan. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07), Carroll John A., Bosch Antal van den, and Zaenen Annie (Eds.). The Association for Computational Linguistics, Prague. https://aclanthology.org/P07-2045/.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Koehn Philipp, Och Franz Josef, and Marcu Daniel. 2003. Statistical phrase-based translation. In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’03), Hearst Marti A. and Ostendorf Mari (Eds.). The Association for Computational Linguistics, Edmonton. https://aclanthology.org/N03-1017/.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] L’Hostis Gurvan, Grangier David, and Auli Michael. 2016. Vocabulary selection strategies for neural machine translation. arXiv:1610.00072. Retrieved from http://arxiv.org/abs/1610.00072.Google ScholarGoogle Scholar
  9. [9] Luong Thang, Pham Hieu, and Manning Christopher D.. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15), Màrquez Lluís, Callison-Burch Chris, Su Jian, Pighin Daniele, and Marton Yuval (Eds.). The Association for Computational Linguistics, Lisbon, 14121421. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Mi Haitao, Wang Zhiguo, and Ittycheriah Abe. 2016. Vocabulary manipulation for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). The Association for Computer Linguistics, Berlin. Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Petrov Slav, Haghighi Aria, and Klein Dan. 2008. Coarse-to-fine syntactic machine translation using language projections. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). ACL, Honolulu, Hawaii, 108116. Retrieved from https://aclanthology.org/D08-1012/.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Sankaran Baskaran, Freitag Markus, and Al-Onaizan Yaser. 2017. Attention-based vocabulary selection for NMT decoding. arXiv:1706.03824. Retrieved from http://arxiv.org/abs/1706.03824.Google ScholarGoogle Scholar
  13. [13] Sankaran Baskaran, Freitag Markus, and Al-Onaizan Yaser. 2017. Attention-based vocabulary selection for NMT decoding. arXiv:1706.03824. Retrieved from http://arxiv.org/abs/1706.03824.Google ScholarGoogle Scholar
  14. [14] See Abigail, Luong Minh-Thang, and Manning Christopher D.. 2016. Compression of neural machine translation models via pruning. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL’16), Goldberg Yoav and Riezler Stefan (Eds.). ACL, Berlin, 291301. Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Edinburgh neural machine translation systems for WMT 16. In Proceedings of the 1st Conference on Machine Translation (WMT’16). The Association for Computer Linguistics, Berlin, 371376. Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Shi Xing and Knight Kevin. 2017. Speeding up neural machine translation decoding by shrinking run-time vocabulary. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17), Barzilay Regina and Kan Min-Yen (Eds.). Association for Computational Linguistics, Vancouver, 574579. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Guyon Isabelle, Luxburg Ulrike von, Bengio Samy, Wallach Hanna M., Fergus Rob, Vishwanathan S. V. N., and Garnett Roman (Eds.). Long Beach, 59986008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.Google ScholarGoogle Scholar
  18. [18] Wu Yonghui, Schuster Mike, Chen Zhifeng, Le Quoc V., Norouzi Mohammad, Macherey Wolfgang, Krikun Maxim, Cao Yuan, Gao Qin, Macherey Klaus, Klingner Jeff, Shah Apurva, Johnson Melvin, Liu Xiaobing, Kaiser Lukasz, Gouws Stephan, Kato Yoshikiyo, Kudo Taku, Kazawa Hideto, Stevens Keith, Kurian George, Patil Nishant, Wang Wei, Young Cliff, Smith Jason, Riesa Jason, Rudnick Alex, Vinyals Oriol, Corrado Greg, Hughes Macduff, and Dean Jeffrey. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from http://arxiv.org/abs/1609.08144.Google ScholarGoogle Scholar
  19. [19] Xiao Tong, Li Yinqiao, Zhu Jingbo, Yu Zhengtao, and Liu Tongran. 2019. Sharing attention weights for fast transformer. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), Kraus Sarit (Ed.). ijcai.org, Macao, 5292–5298. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Coarse-to-Fine Output Predictions for Efficient Decoding in Neural Machine Translation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 6
      November 2022
      372 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3568970
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 December 2022
      • Online AM: 7 April 2022
      • Accepted: 18 March 2022
      • Revised: 8 February 2022
      • Received: 15 June 2021
      Published in tallip Volume 21, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!