skip to main content
10.1145/3573381.3596466acmconferencesArticle/Chapter ViewAbstractPublication PagesimxConference Proceedingsconference-collections
Work in Progress

Enhancing Arabic Content Generation with Prompt Augmentation Using Integrated GPT and Text-to-Image Models

Authors Info & Claims
Published:29 August 2023Publication History

ABSTRACT

With the current and continuous advancements in the field of text-to-image modeling, it has become critical to design prompts that make the best of these model capabilities and guides them to generate the most desirable images, and thus the field of prompt engineering has emerged. Here, we study a method to use prompt engineering to enhance text-to-image model representation of the Arabic culture. This work proposes a simple, novel approach for prompt engineering that uses the domain knowledge of a state-of-the-art language model, GPT, to perform the task of prompt augmentation, where a simple, initial prompt is used to generate multiple, more detailed prompts related to the Arabic culture from multiple categories through a GPT model through a process known as in-context learning. The augmented prompts are then used to generate images enhanced for the Arabic culture. We perform multiple experiments with a number of participants to evaluate the performance of the proposed method, which shows promising results, specially for generating prompts that are more inclusive of the different Arabic countries and with a wider variety in terms of image subjects, where we find that our proposed method generates image with more variety 85 % of the time and are more inclusive of the Arabic countries more than 72.66 % of the time, compared to the direct approach.

References

  1. 2022. Midjourney AI. https://www.midjourney.com/Google ScholarGoogle Scholar
  2. Thorsten Brants, Ashok C Popat, Peng Xu, Franz J Och, and Jeffrey Dean. 2007. Large language models in machine translation. (2007).Google ScholarGoogle Scholar
  3. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.Google ScholarGoogle Scholar
  4. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Naman Jain, Skanda Vaidyanath, Arun Iyer, Nagarajan Natarajan, Suresh Parthasarathy, Sriram Rajamani, and Rahul Sharma. 2022. Jigsaw: Large Language Models Meet Program Synthesis. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 1219–1231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In CHI Conference on Human Factors in Computing Systems. 1–23.Google ScholarGoogle Scholar
  7. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In ICML.Google ScholarGoogle Scholar
  8. Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).Google ScholarGoogle Scholar
  9. Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. In Proceedings of the 38th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 139). 8821–8831.Google ScholarGoogle Scholar
  10. Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In International conference on machine learning. PMLR, 1060–1069.Google ScholarGoogle Scholar
  11. Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256–2265.Google ScholarGoogle Scholar
  14. Jesse Vig, Ali Madani, Lav R Varshney, Caiming Xiong, Richard Socher, and Nazneen Fatema Rajani. 2020. BERTology meets biology: interpreting attention in protein language models. arXiv preprint arXiv:2006.15222 (2020).Google ScholarGoogle Scholar
  15. Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022. Learning to prompt for vision-language models. International Journal of Computer Vision 130, 9 (2022), 2337–2348.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Enhancing Arabic Content Generation with Prompt Augmentation Using Integrated GPT and Text-to-Image Models

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            IMX '23: Proceedings of the 2023 ACM International Conference on Interactive Media Experiences
            June 2023
            465 pages
            ISBN:9798400700286
            DOI:10.1145/3573381

            Copyright © 2023 Owner/Author

            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 29 August 2023

            Check for updates

            Qualifiers

            • Work in Progress
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate69of245submissions,28%

            Upcoming Conference

            IMX '24
          • Article Metrics

            • Downloads (Last 12 months)138
            • Downloads (Last 6 weeks)14

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format