Abstract
Generating food images from recipe and ingredient information can be applied to many tasks such as food recommendation, recipe development, and health management. For the characteristics of food images, this paper proposes ML-CookGAN, a novel CGAN. This network enables the generation of food images based on recipe and ingredient labels. The generator of ML-CookGAN, Multi-Label Fusion Generator, converts recipe and ingredient labels into different granularity features and generates corresponding food images. The discriminator of ML-CookGAN, Multi-Branch Discriminator, implements discrimination and classification with a multi-branch structure. In addition, we propose two training strategies, Region-Wise Pooling and Image Style Distillation, to better the network performance. Region-Wise Pooling handles region-wise features with the discriminator. Image Style Distillation aims at extracting image latent features to assist image generation by an unsupervised method. The experiments conducted on VIREO Food-172 databases validate the proposed method to generate high-quality Chinese food images. And Region-Wise Pooling and Image Style Distillation are proven to enhance the diversity and realism of generated food images.
- [1] . 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning. PMLR, 214–223.Google Scholar
Digital Library
- [2] . 2018. A note on the inception score. arXiv preprint arXiv:1801.01973 (2018).Google Scholar
- [3] . 2017. Food ingredients recognition through multi-label learning. In International Conference on Image Analysis and Processing. Springer, 394–402.Google Scholar
Cross Ref
- [4] . 2019. Pros and cons of GAN evaluation measures. Computer Vision and Image Understanding 179 (2019), 41–65.Google Scholar
Digital Library
- [5] . 2014. Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision.Google Scholar
Cross Ref
- [6] . 2016. Deep-based ingredient recognition for cooking recipe retrieval. In Proceedings of the 24th ACM International Conference on Multimedia. 32–41.Google Scholar
Digital Library
- [7] . 2020. A study of multi-task and region-wise deep learning for food ingredient recognition. IEEE Transactions on Image Processing 30 (2020), 1514–1526.Google Scholar
Cross Ref
- [8] . 2019. Ramen as you like: Sketch-based food image generation and editing. in Proceedings of the 27th ACM International Conference on Multimedia. 2217–2218.Google Scholar
Digital Library
- [9] . 2021. A review of Generative Adversarial Networks (GANs) and its applications in a wide variety of disciplines–from medical to remote sensing. arXiv preprint arXiv:2110.01442 (2021).Google Scholar
- [10] . 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).Google Scholar
- [11] . 2020. A review on generative adversarial networks: Algorithms, theory, and applications. arXiv preprint arXiv:2001.06937 (2020).Google Scholar
- [12] . 2017. Improved training of Wasserstein GANs. arXiv preprint arXiv:1704.00028 (2017).Google Scholar
- [13] . 2020. CookGAN: Meal image synthesis from ingredients. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1450–1458.Google Scholar
Cross Ref
- [14] . 2020. MPG: A multi-ingredient pizza image generator with conditional StyleGANs. arXiv preprint arXiv:2012.02821 (2020).Google Scholar
- [15] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [16] . 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems 30 (2017).Google Scholar
- [17] . 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.Google Scholar
Cross Ref
- [18] . 2020. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110–8119.Google Scholar
Cross Ref
- [19] . 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4681–4690.Google Scholar
Cross Ref
- [20] . 2020. MVANet: Multi-tasks guided multi-view attention network for Chinese food recognition. IEEE Transactions on Multimedia (2020).Google Scholar
- [21] . 2020. Food and ingredient joint learning for fine-grained recognition. IEEE Transactions on Circuits and Systems for Video Technology (2020).Google Scholar
- [22] . 2020. Diverse image generation via self-conditioned GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14286–14295.Google Scholar
Cross Ref
- [23] . 2018. BAGAN: Data augmentation with balancing GAN. arXiv preprint arXiv:1803.09655 (2018).Google Scholar
- [24] . 2019. Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1 (2019), 187–203.Google Scholar
Digital Library
- [25] . 2019. A survey on food computing. ACM Computing Surveys (CSUR) 52, 5 (2019), 1–36.Google Scholar
Digital Library
- [26] . 2019. Ingredient-guided cascaded multi-attention network for food recognition. In Proceedings of the 27th ACM International Conference on Multimedia. 1331–1339.Google Scholar
Digital Library
- [27] . 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
- [28] . 2018. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018).Google Scholar
- [29] . 2017. Conditional image synthesis with auxiliary classifier GANs. In International Conference on Machine Learning. PMLR, 2642–2651.Google Scholar
Digital Library
- [30] . 2020. ChefGAN: Food image generation from recipes. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA) (
MM’20 ). Association for Computing Machinery, New York, NY, USA, 4244–4252. Google ScholarDigital Library
- [31] . 2019. How to make a pizza: Learning a compositional layer-based GAN model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8002–8011.Google Scholar
Cross Ref
- [32] . 2016. Improved techniques for training GANs. Advances in Neural Information Processing Systems 29 (2016), 2234–2242.Google Scholar
Digital Library
- [33] . 2019. Inverse cooking: Recipe generation from food images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10453–10462.Google Scholar
Cross Ref
- [34] . 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.Google Scholar
Cross Ref
- [35] . 2015. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844 (2015).Google Scholar
- [36] . 2017. Food recommender systems: Important contributions, challenges and future research directions. arXiv preprint arXiv:1711.02760 (2017).Google Scholar
- [37] . 2015. Improving deep neural networks using softplus units. In 2015 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–4.Google Scholar
Cross Ref
- [38] . 2020. CookGAN: Causality based text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5519–5527.Google Scholar
Cross Ref
- [39] . 2019. R2GAN: Cross-modal recipe retrieval with generative adversarial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11477–11486.Google Scholar
Cross Ref
Index Terms
ML-CookGAN: Multi-Label Generative Adversarial Network for Food Image Generation
Recommendations
Lab2Pix: Label-Adaptive Generative Adversarial Network for Unsupervised Image Synthesis
MM '20: Proceedings of the 28th ACM International Conference on MultimediaLab2Pix refers to the task of generating photo-realistic images from labels, e.g., semantic labels or sketch labels. Despite inheriting from image-to-image translation, Lab2Pix develops its own characteristics due to the differences between labels and ...
A Method for Face Image Inpainting Based on Autoencoder and Generative Adversarial Network
Image and Video TechnologyLarge-area damage image restoration algorithm based on generative adversarial network
AbstractGiven that the traditional image restoration algorithm cannot generate high-quality false images and the restoration accuracy for the large-area damaged images is low, this study proposed the restoration algorithm of large-area damaged images ...






Comments