skip to main content
short-paper

Open-Domain Response Generation in Low-Resource Settings using Self-Supervised Pre-Training of Warm-Started Transformers

Published:25 March 2023Publication History
Skip Abstract Section

Abstract

Learning response generation models constitute the main component of building open-domain dialogue systems. However, training open-domain response generation models requires large amounts of labeled data and pre-trained language generation models that are often nonexistent for low-resource languages. In this article, we propose a framework for training open-domain response generation models in low-resource settings. We consider Dialectal Arabic (DA) as a working example. The framework starts by warm-starting a transformer-based encoder-decoder with pre-trained language model parameters. Next, the resultant encoder-decoder model is adapted to DA by employing self-supervised pre-training on large-scale unlabeled data in the desired dialect. Finally, the model is fine-tuned on a very small labeled dataset for open-domain response generation. The results show significant performance improvements on three spoken Arabic dialects after adopting the framework’s three stages, highlighted by higher BLEU and lower Perplexity scores compared with multiple baseline models. Specifically, our models are capable of generating fluent responses in multiple dialects with an average human-evaluated fluency score above 4. Our data is made publicly available.

REFERENCES

  1. [1] Abdelali Ahmed, Darwish Kareem, Durrani Nadir, and Mubarak Hamdy. 2016. Farasa: A fast and furious segmenter for Arabic. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. 1116.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Abdul-Mageed Muhammad, Elmadany AbdelRahim, and Nagoudi El Moatez Billah. 2020. ARBERT & MARBERT: Deep bidirectional transformers for Arabic. arXiv preprint arXiv:2101.01785 (2020).Google ScholarGoogle Scholar
  3. [3] Abdul-Mageed Muhammad, Zhang Chiyu, Bouamor Houda, and Habash Nizar. 2020. NADI 2020: The first nuanced Arabic dialect identification shared task. In Proceedings of the 5th Arabic Natural Language Processing Workshop. 97110.Google ScholarGoogle Scholar
  4. [4] Abdul-Mageed Muhammad, Zhang Chiyu, Elmadany AbdelRahim, and Ungar Lyle. 2020. Beyond geolocation: Micro-dialect identification in diaglossic and code-switched environments. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 58555876.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Adiwardana Daniel, Luong Minh-Thang, So David R., Hall Jamie, Fiedel Noah, Thoppilan Romal, Yang Zi, Kulshreshtha Apoorv, Nemade Gaurav, Lu Yifeng, and Quoc V. Le. 2020. Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977 (2020).Google ScholarGoogle Scholar
  6. [6] Ali Dana Abu and Habash Nizar. 2016. Botta: An Arabic dialect chatbot. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations. 208212.Google ScholarGoogle Scholar
  7. [7] Antoun Wissam, Baly Fady, and Hajj Hazem. 2020. AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the LREC 2020 Workshop Language Resources and Evaluation Conference (11–16 May 2020). 9.Google ScholarGoogle Scholar
  8. [8] Bahdanau Dzmitry, Cho Kyung Hyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, (ICLR 2015).Google ScholarGoogle Scholar
  9. [9] Basu Sourya, Ramachandran Govardana Sachitanandam, Keskar Nitish Shirish, and Varshney Lav R.. 2020. Mirostat: A neural text decoding algorithm that directly controls perplexity. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  10. [10] Bouamor Houda, Hassan Sabit, and Habash Nizar. 2019. The MADAR shared task on Arabic fine-grained dialect identification. In Proceedings of the 4th Arabic Natural Language Processing Workshop. 199207.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Dinan Emily, Roller Stephen, Shuster Kurt, Fan Angela, Auli Michael, and Weston Jason. 2018. Wizard of Wikipedia: Knowledge-powered conversational agents. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  12. [12] Eskander Ramy, Habash Nizar, Rambow Owen, and Tomeh Nadi. 2013. Processing spontaneous orthography. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 585595.Google ScholarGoogle Scholar
  13. [13] Fadhil Ahmed and Ahmed AbuRaed. 2019. OlloBot-towards a text-based Arabic health conversational agent: Evaluation and results. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). 295303.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Hedderich Michael A., Lange Lukas, Adel Heike, Strötgen Jannik, and Klakow Dietrich. 2020. A survey on recent approaches for natural language processing in low-resource scenarios. arXiv preprint arXiv:2010.12309 (2020).Google ScholarGoogle Scholar
  15. [15] Higashinaka Ryuichiro, Imamura Kenji, Meguro Toyomi, Miyazaki Chiaki, Kobayashi Nozomi, Sugiyama Hiroaki, Hirano Toru, Makino Toshiro, and Matsuo Yoshihiro. 2014. Towards an open-domain conversational system fully based on natural language processing. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 928939.Google ScholarGoogle Scholar
  16. [16] Huang Minlie, Zhu Xiaoyan, and Gao Jianfeng. 2020. Challenges in building intelligent open-domain dialog systems. ACM Transactions on Information Systems (TOIS) 38, 3 (2020), 132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Daphne Ippolito, Reno Kriz, Maria Kustikova, João Sedoc, and Chris Callison-Burch. 2018. Comparison of diverse decoding methods from conditional language models. In Proceedings of the 57th Conference of the Association for Computational Linguistics. Association for Computational Linguistics.Google ScholarGoogle Scholar
  18. [18] Lewis Mike, Liu Yinhan, Goyal Naman, Ghazvininejad Marjan, Mohamed Abdelrahman, Levy Omer, Stoyanov Veselin, and Zettlemoyer Luke. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 78717880.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Li Yanran, Su Hui, Shen Xiaoyu, Li Wenjie, Cao Ziqiang, and Niu Shuzi. 2017. DailyDialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the 8th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 986995.Google ScholarGoogle Scholar
  20. [20] Lin Zhaojiang, Xu Peng, Winata Genta Indra, Siddique Farhad Bin, Liu Zihan, Shin Jamin, and Fung Pascale. 2020. Caire: An end-to-end empathetic chatbot. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1362213623.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Mao Zhuoyuan, Chu Chenhui, and Kurohashi Sadao. 2022. Linguistically driven multi-task pre-training for low-resource neural machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing 21, 4 (2022), 129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Murthy Rudra, Khapra Mitesh M., and Bhattacharyya Pushpak. 2018. Improving NER tagging performance in low-resource languages via multilingual learning. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 2 (2018), 120.Google ScholarGoogle Scholar
  23. [23] Naous Tarek, Antoun Wissam, Mahmoud Reem, and Hajj Hazem. 2021. Empathetic BERT2BERT conversational model: Learning Arabic language generation with little data. In Proceedings of the 6th Arabic Natural Language Processing Workshop. 164172.Google ScholarGoogle Scholar
  24. [24] Naous Tarek, Hokayem Christian, and Hajj Hazem. 2020. Empathy-driven Arabic conversational chatbot. In Proceedings of the 5th Arabic Natural Language Processing Workshop. 5868.Google ScholarGoogle Scholar
  25. [25] Otegi Arantxa, Agirre Aitor, Campos Jon Ander, Soroa Aitor, and Agirre Eneko. 2020. Conversational question answering in low resource scenarios: A dataset and case study for Basque. In Proceedings of the 12th Language Resources and Evaluation Conference. 436442.Google ScholarGoogle Scholar
  26. [26] Ranasinghe Tharindu and Zampieri Marcos. 2021. Multilingual offensive language identification for low-resource languages. Transactions on Asian and Low-Resource Language Information Processing 21, 1 (2021), 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Rashkin Hannah, Smith Eric Michael, Li Margaret, and Boureau Y-Lan. 2019. Towards empathetic open-domain conversation models: A new benchmark and dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 53705381.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Roller Stephen, Boureau Y-Lan, Weston Jason, Bordes Antoine, Dinan Emily, Fan Angela, Gunning David, Ju Da, Li Margaret, Poff Spencer, et al. 2020. Open-domain conversational agents: Current progress, open problems, and future directions. arXiv preprint arXiv:2006.12442 (2020).Google ScholarGoogle Scholar
  29. [29] Roller Stephen, Dinan Emily, Goyal Naman, Ju Da, Williamson Mary, Liu Yinhan, Xu Jing, Ott Myle, Smith Eric Michael, Boureau Y-Lan, and Weston Jason. 2021. Recipes for building an open-domain chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 300325.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Rothe Sascha, Narayan Shashi, and Severyn Aliaksei. 2020. Leveraging pre-trained checkpoints for sequence generation tasks. Transactions of the Association for Computational Linguistics 8 (2020), 264280.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Shum Heung-Yeung, He Xiao-dong, and Li Di. 2018. From Eliza to XiaoIce: Challenges and opportunities with social chatbots. Frontiers of Information Technology & Electronic Engineering 19, 1 (2018), 1026.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 59986008.Google ScholarGoogle Scholar
  33. [33] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 3845.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Xiang Lu, Zhu Junnan, Zhao Yang, Zhou Yu, and Zong Chengqing. 2021. Robust cross-lingual task-oriented dialogue. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 6 (2021), 124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Xue Linting, Constant Noah, Roberts Adam, Kale Mihir, Al-Rfou Rami, Siddhant Aditya, Barua Aditya, and Raffel Colin. 2021. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 483498.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Ze Yang, Wei Wu, Jian Yang, Can Xu, and Zhoujun Li. 2019. Low-resource response generation with template prior. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 18861897.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Zhang Saizheng, Dinan Emily, Urbanek Jack, Szlam Arthur, Kiela Douwe, and Weston Jason. 2018. Personalizing dialogue agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 22042213.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Zhang Yizhe, Sun Siqi, Galley Michel, Chen Yen-Chun, Brockett Chris, Gao Xiang, Gao Jianfeng, Liu Jingjing, and Dolan William B.. 2020. DIALOGPT: Large-scale generative pre-training for conversational response generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 270278.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Zhong Peixiang, Zhang Chen, Wang Hao, Liu Yong, and Miao Chunyan. 2020. Towards persona-based empathetic conversational models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 65566566.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Zhou Li, Gao Jianfeng, Li Di, and Shum Heung-Yeung. 2020. The design and implementation of xiaoice, an empathetic social chatbot. Computational Linguistics 46, 1 (2020), 5393.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Open-Domain Response Generation in Low-Resource Settings using Self-Supervised Pre-Training of Warm-Started Transformers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 4
        April 2023
        682 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3588902
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 March 2023
        • Online AM: 5 January 2023
        • Accepted: 28 December 2022
        • Revised: 6 March 2022
        • Received: 6 March 2022
        Published in tallip Volume 22, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
      • Article Metrics

        • Downloads (Last 12 months)187
        • Downloads (Last 6 weeks)17

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!