skip to main content
research-article

ML-CookGAN: Multi-Label Generative Adversarial Network for Food Image Generation

Authors Info & Claims
Published:17 February 2023Publication History
Skip Abstract Section

Abstract

Generating food images from recipe and ingredient information can be applied to many tasks such as food recommendation, recipe development, and health management. For the characteristics of food images, this paper proposes ML-CookGAN, a novel CGAN. This network enables the generation of food images based on recipe and ingredient labels. The generator of ML-CookGAN, Multi-Label Fusion Generator, converts recipe and ingredient labels into different granularity features and generates corresponding food images. The discriminator of ML-CookGAN, Multi-Branch Discriminator, implements discrimination and classification with a multi-branch structure. In addition, we propose two training strategies, Region-Wise Pooling and Image Style Distillation, to better the network performance. Region-Wise Pooling handles region-wise features with the discriminator. Image Style Distillation aims at extracting image latent features to assist image generation by an unsupervised method. The experiments conducted on VIREO Food-172 databases validate the proposed method to generate high-quality Chinese food images. And Region-Wise Pooling and Image Style Distillation are proven to enhance the diversity and realism of generated food images.

REFERENCES

  1. [1] Arjovsky Martin, Chintala Soumith, and Bottou Léon. 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning. PMLR, 214223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Barratt Shane and Sharma Rishi. 2018. A note on the inception score. arXiv preprint arXiv:1801.01973 (2018).Google ScholarGoogle Scholar
  3. [3] Bolaños Marc, Ferrà Aina, and Radeva Petia. 2017. Food ingredients recognition through multi-label learning. In International Conference on Image Analysis and Processing. Springer, 394402.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Borji Ali. 2019. Pros and cons of GAN evaluation measures. Computer Vision and Image Understanding 179 (2019), 4165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Bossard Lukas, Guillaumin Matthieu, and Gool Luc Van. 2014. Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chen Jingjing and Ngo Chong-Wah. 2016. Deep-based ingredient recognition for cooking recipe retrieval. In Proceedings of the 24th ACM International Conference on Multimedia. 3241.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Chen Jingjing, Zhu Bin, Ngo Chong-Wah, Chua Tat-Seng, and Jiang Yu-Gang. 2020. A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Transactions on Image Processing 30 (2020), 15141526.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Cho Jaehyeong, Shimoda Wataru, and Yanai Keiji. 2019. Ramen as you like: Sketch-based food image generation and editing. in Proceedings of the 27th ACM International Conference on Multimedia. 22172218.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Dash Ankan, Ye Junyi, and Wang Guiling. 2021. A review of Generative Adversarial Networks (GANs) and its applications in a wide variety of disciplines–from medical to remote sensing. arXiv preprint arXiv:2110.01442 (2021).Google ScholarGoogle Scholar
  10. [10] Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).Google ScholarGoogle Scholar
  11. [11] Gui Jie, Sun Zhenan, Wen Yonggang, Tao Dacheng, and Ye Jieping. 2020. A review on generative adversarial networks: Algorithms, theory, and applications. arXiv preprint arXiv:2001.06937 (2020).Google ScholarGoogle Scholar
  12. [12] Gulrajani Ishaan, Ahmed Faruk, Arjovsky Martin, Dumoulin Vincent, and Courville Aaron. 2017. Improved training of Wasserstein GANs. arXiv preprint arXiv:1704.00028 (2017).Google ScholarGoogle Scholar
  13. [13] Han Fangda, Guerrero Ricardo, and Pavlovic Vladimir. 2020. CookGAN: Meal image synthesis from ingredients. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 14501458.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Han Fangda, Hao Guoyao, Guerrero Ricardo, and Pavlovic Vladimir. 2020. MPG: A multi-ingredient pizza image generator with conditional StyleGANs. arXiv preprint arXiv:2012.02821 (2020).Google ScholarGoogle Scholar
  15. [15] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Heusel Martin, Ramsauer Hubert, Unterthiner Thomas, Nessler Bernhard, and Hochreiter Sepp. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems 30 (2017).Google ScholarGoogle Scholar
  17. [17] Karras Tero, Laine Samuli, and Aila Timo. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 44014410.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Karras Tero, Laine Samuli, Aittala Miika, Hellsten Janne, Lehtinen Jaakko, and Aila Timo. 2020. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 81108119.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Ledig Christian, Theis Lucas, Huszár Ferenc, Caballero Jose, Cunningham Andrew, Acosta Alejandro, Aitken Andrew, Tejani Alykhan, Totz Johannes, Wang Zehan, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 46814690.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Liang Haozan, Wen Guihua, Hu Yang, Luo Mingnan, Yang Pei, and Xu Yingxue. 2020. MVANet: Multi-tasks guided multi-view attention network for Chinese food recognition. IEEE Transactions on Multimedia (2020).Google ScholarGoogle Scholar
  21. [21] Liu Chengxu, Liang Yuanzhi, Xue Yao, Qian Xueming, and Fu Jianlong. 2020. Food and ingredient joint learning for fine-grained recognition. IEEE Transactions on Circuits and Systems for Video Technology (2020).Google ScholarGoogle Scholar
  22. [22] Liu Steven, Wang Tongzhou, Bau David, Zhu Jun-Yan, and Torralba Antonio. 2020. Diverse image generation via self-conditioned GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1428614295.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Mariani Giovanni, Scheidegger Florian, Istrate Roxana, Bekas Costas, and Malossi Cristiano. 2018. BAGAN: Data augmentation with balancing GAN. arXiv preprint arXiv:1803.09655 (2018).Google ScholarGoogle Scholar
  24. [24] Marin Javier, Biswas Aritro, Ofli Ferda, Hynes Nicholas, Salvador Amaia, Aytar Yusuf, Weber Ingmar, and Torralba Antonio. 2019. Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1 (2019), 187203.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Min Weiqing, Jiang Shuqiang, Liu Linhu, Rui Yong, and Jain Ramesh. 2019. A survey on food computing. ACM Computing Surveys (CSUR) 52, 5 (2019), 136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Min Weiqing, Liu Linhu, Luo Zhengdong, and Jiang Shuqiang. 2019. Ingredient-guided cascaded multi-attention network for food recognition. In Proceedings of the 27th ACM International Conference on Multimedia. 13311339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Mirza Mehdi and Osindero Simon. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google ScholarGoogle Scholar
  28. [28] Miyato Takeru, Kataoka Toshiki, Koyama Masanori, and Yoshida Yuichi. 2018. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018).Google ScholarGoogle Scholar
  29. [29] Odena Augustus, Olah Christopher, and Shlens Jonathon. 2017. Conditional image synthesis with auxiliary classifier GANs. In International Conference on Machine Learning. PMLR, 26422651.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Pan Siyuan, Dai Ling, Hou Xuhong, Li Huating, and Sheng Bin. 2020. ChefGAN: Food image generation from recipes. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA) (MM’20). Association for Computing Machinery, New York, NY, USA, 42444252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Papadopoulos Dim P., Tamaazousti Youssef, Ofli Ferda, Weber Ingmar, and Torralba Antonio. 2019. How to make a pizza: Learning a compositional layer-based GAN model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 80028011.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Salimans Tim, Goodfellow Ian, Zaremba Wojciech, Cheung Vicki, Radford Alec, and Chen Xi. 2016. Improved techniques for training GANs. Advances in Neural Information Processing Systems 29 (2016), 22342242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Salvador Amaia, Drozdzal Michal, Nieto Xavier Giro-i, and Romero Adriana. 2019. Inverse cooking: Recipe generation from food images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1045310462.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Szegedy Christian, Vanhoucke Vincent, Ioffe Sergey, Shlens Jon, and Wojna Zbigniew. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 28182826.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Theis Lucas, Oord Aäron van den, and Bethge Matthias. 2015. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844 (2015).Google ScholarGoogle Scholar
  36. [36] Trattner Christoph and Elsweiler David. 2017. Food recommender systems: Important contributions, challenges and future research directions. arXiv preprint arXiv:1711.02760 (2017).Google ScholarGoogle Scholar
  37. [37] Zheng Hao, Yang Zhanlei, Liu Wenju, Liang Jizhong, and Li Yanpeng. 2015. Improving deep neural networks using softplus units. In 2015 International Joint Conference on Neural Networks (IJCNN). IEEE, 14.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Zhu Bin and Ngo Chong-Wah. 2020. CookGAN: Causality based text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 55195527.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Zhu Bin, Ngo Chong-Wah, Chen Jingjing, and Hao Yanbin. 2019. R2GAN: Cross-modal recipe retrieval with generative adversarial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1147711486.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. ML-CookGAN: Multi-Label Generative Adversarial Network for Food Image Generation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2s
          April 2023
          545 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3572861
          • Editor:
          • Abdulmotaleb El Saddik
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 February 2023
          • Online AM: 13 August 2022
          • Accepted: 23 July 2022
          • Revised: 7 June 2022
          • Received: 5 November 2021
          Published in tomm Volume 19, Issue 2s

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)271
          • Downloads (Last 6 weeks)32

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!