ABSTRACT
The success of StyleGAN has enabled unprecedented semantic editing capabilities, on both synthesized and real images. However, such editing operations are either trained with semantic supervision or annotated manually by users. In another development, the CLIP architecture has been trained with internet-scale loose image and text pairings, and has been shown to be useful in several zero-shot learning settings. In this work, we investigate how to effectively link the pretrained latent spaces of StyleGAN and CLIP, which in turn allows us to automatically extract semantically-labeled edit directions from StyleGAN, finding and naming meaningful edit operations, in a fully unsupervised setup, without additional human guidance. Technically, we propose two novel building blocks; one for discovering interesting CLIP directions and one for semantically labeling arbitrary directions in CLIP latent space. The setup does not assume any pre-determined labels and hence we do not require any additional supervised text/attributes to build the editing framework. We evaluate the effectiveness of the proposed method and demonstrate that extraction of disentangled labeled StyleGAN edit directions is indeed possible, revealing interesting and non-trivial edit directions.
Supplemental Material
Available for Download
- Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4432–4441.Google Scholar
Cross Ref
- Rameen Abdal, Peihao Zhu, Niloy J. Mitra, and Peter Wonka. 2021a. Labels4Free: Unsupervised Segmentation Using StyleGAN. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 13970–13979.Google Scholar
Cross Ref
- Rameen Abdal, Peihao Zhu, Niloy J. Mitra, and Peter Wonka. 2021b. StyleFlow: Attribute-Conditioned Exploration of StyleGAN-Generated Images Using Conditional Continuous Normalizing Flows. ACM Trans. Graph. 40, 3, Article 21 (May 2021), 21 pages. https://doi.org/10.1145/3447648Google Scholar
Digital Library
- Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google Scholar
Cross Ref
- Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. 214–223.Google Scholar
Digital Library
- David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. 2018. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. arxiv:1811.10597 [cs.CV]Google Scholar
- David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, and Antonio Torralba. 2019. Seeing what a gan cannot generate. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4502–4511.Google Scholar
Cross Ref
- Adam Bielski and Paolo Favaro. 2019. Emergence of object segmentation in perturbed generative models. arXiv preprint arXiv:1905.12663(2019).Google Scholar
- Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations. https://openreview.net/forum?id=B1xsqj09FmGoogle Scholar
- Eric R. Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. 2020. pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis. arxiv:2012.00926 [cs.CV]Google Scholar
- Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. UNITER: UNiversal Image-TExt Representation Learning. arxiv:1909.11740 [cs.CV]Google Scholar
- Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Edo Collins, Raja Bala, Bob Price, and Sabine Süsstrunk. 2020. Editing in Style: Uncovering the Local Semantics of GANs. arxiv:2004.14367 [cs.CV]Google Scholar
- Karan Desai and Justin Johnson. 2021. VirTex: Learning Visual Representations from Textual Annotations. arxiv:2006.06666 [cs.CV]Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL]Google Scholar
- Andrea Frome, Greg Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc’Aurelio Ranzato, and Tomas Mikolov. 2013. Devise: A deep visual-semantic embedding model. (2013).Google Scholar
- Rinon Gal, Or Patashnik, Haggai Maron, Gal Chechik, and Daniel Cohen-Or. 2021. StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators. arxiv:2108.00946 [cs.CV]Google Scholar
- Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017a. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems. 5767–5777.Google Scholar
- Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017b. Improved training of wasserstein gans. In Advances in neural information processing systems. 5767–5777.Google Scholar
- Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. Ganspace: Discovering interpretable gan controls. arXiv preprint arXiv:2004.02546(2020).Google Scholar
- Ali Jahanian, Lucy Chai, and Phillip Isola. 2020. On the ”steerability” of generative adversarial networks. In International Conference on Learning Representations.Google Scholar
- Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arxiv:1710.10196 [cs.NE]Google Scholar
- Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Proc. NeurIPS.Google Scholar
- Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-Free Generative Adversarial Networks. arxiv:2106.12423 [cs.CV]Google Scholar
- Tero Karras, Samuli Laine, and Timo Aila. 2018. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948(2018).Google Scholar
- Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and Improving the Image Quality of StyleGAN. In Proc. CVPR.Google Scholar
Cross Ref
- Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, and Jianfeng Gao. 2020. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. arxiv:2004.06165 [cs.CV]Google Scholar
- Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. arxiv:1908.02265 [cs.CV]Google Scholar
- Martin Maier and Abdel Rasha. 2018. Native language promotes access to visual consciousness. Psychological Science 29, 11 (2018), 1757–1772.Google Scholar
Cross Ref
- Xudong Mao, Qing Li, Haoran Xie, Raymond Y.K. Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least Squares Generative Adversarial Networks. In The IEEE International Conference on Computer Vision (ICCV).Google Scholar
- Quan Meng, Anpei Chen, Haimin Luo, Minye Wu, Hao Su, Lan Xu, Xuming He, and Jingyi Yu. 2021. GNeRF: GAN-based Neural Radiance Field without Posed Camera. arxiv:2103.15606 [cs.CV]Google Scholar
- Xingang Pan, Bo Dai, Ziwei Liu, Chen Change Loy, and Ping Luo. 2021. Do 2D {GAN}s Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image {GAN}s. In International Conference on Learning Representations. https://openreview.net/forum?id=FGqiDsBUKL0Google Scholar
- Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. arxiv:2103.17249 [cs.CV]Google Scholar
- William Peebles, John Peebles, Jun-Yan Zhu, Alexei A. Efros, and Antonio Torralba. 2020. The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement. In Proceedings of European Conference on Computer Vision (ECCV).Google Scholar
Digital Library
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]Google Scholar
- Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434(2015).Google Scholar
- Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. arXiv preprint arXiv:2008.00951(2020).Google Scholar
- Mert Bulent Sariyildiz, Julien Perez, and Diane Larlus. 2020. Learning Visual Representations with Caption Annotations. arxiv:2008.01392 [cs.CV]Google Scholar
- Adriaan MJ Schakel and Benjamin J Wilson. 2015. Measuring word significance using distributed representations of words. arXiv preprint arXiv:1508.02297(2015).Google Scholar
- Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).Google Scholar
- Yujun Shen and Bolei Zhou. 2021. Closed-Form Factorization of Latent Semantics in GANs. In CVPR.Google Scholar
- Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2020. VL-BERT: Pre-training of Generic Visual-Linguistic Representations. arxiv:1908.08530 [cs.CV]Google Scholar
- Hao Tan and Mohit Bansal. 2019. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. arxiv:1908.07490 [cs.CL]Google Scholar
- Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, and Nenghai Yu. 2020. MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing. ACM Transactions on Graphics (TOG) 39, 4 (2020), 1–13.Google Scholar
Digital Library
- Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhofer, and Christian Theobalt. 2020a. Stylerig: Rigging stylegan for 3d control over portrait images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6142–6151.Google Scholar
Cross Ref
- Ayush Tewari, Mohamed Elgharib, Mallikarjun BR, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zöllhofer, and Christian Theobalt. 2020b. PIE: Portrait Image Embedding for Semantic Control. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia) 39, 6. https://doi.org/10.1145/3414685.3417803Google Scholar
- Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel Cohen-Or. 2021. Designing an Encoder for StyleGAN Image Manipulation. ACM Trans. Graph. 40, 4, Article 133 (jul 2021), 14 pages. https://doi.org/10.1145/3450626.3459838Google Scholar
Digital Library
- Nontawat Tritrong, Pitchaporn Rewatbowornwong, and Supasorn Suwajanakorn. 2021. Repurposing GANs for One-shot Semantic Part Segmentation. arxiv:2103.04379 [cs.CV]Google Scholar
- Andrey Voynov and Artem Babenko. 2020. Unsupervised discovery of interpretable directions in the gan latent space. In International Conference on Machine Learning. PMLR, 9786–9796.Google Scholar
- Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. 2021. Sketch Your Own GAN. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Cross Ref
- Zongze Wu, Dani Lischinski, and Eli Shechtman. 2020. StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation. arXiv preprint arXiv:2011.12799(2020).Google Scholar
- Zhibiao Wu and Martha Palmer. 1994. Verb semantics and lexical selection. arXiv preprint cmp-lg/9406033(1994).Google Scholar
- Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. arXiv preprint arXiv:1506.03365(2015).Google Scholar
- Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, and Sanja Fidler. 2021. DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort. arxiv:2104.06490 [cs.CV]Google Scholar
- Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020b. In-domain gan inversion for real image editing. In European Conference on Computer Vision. Springer, 592–608.Google Scholar
Digital Library
- Peihao Zhu, Rameen Abdal, John Femiani, and Peter Wonka. 2021a. Barbershop: GAN-based Image Compositing using Segmentation Masks. arxiv:2106.01505 [cs.CV]Google Scholar
- Peihao Zhu, Rameen Abdal, John Femiani, and Peter Wonka. 2021b. Mind the Gap: Domain Gap Control for Single Shot Domain Adaptation for Generative Adversarial Networks. arxiv:2110.08398 [cs.CV]Google Scholar
- Peihao Zhu, Rameen Abdal, Yipeng Qin, John Femiani, and Peter Wonka. 2020a. Improved StyleGAN Embedding: Where are the Good Latents?arxiv:2012.09036 [cs.CV]Google Scholar
Recommendations
Unsupervised Semantic Generative Adversarial Networks for Expert Retrieval
WWW '19: The World Wide Web ConferenceSources in computer-based collaborative systems such as webpages can help employees to connect and cooperate with each other. It is natural to enable the systems to look not only for documents but also for experts. In this paper, we study the problem of ...
Prostate Cancer Histology Synthesis Using StyleGAN Latent Space Annotation
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022AbstractThe latent space of a generative adversarial network (GAN) may model pathologically-significant semantics with unsupervised learning. To explore this phenomenon, we trained and tested a StyleGAN2 on a high quality prostate histology dataset ...
Unsupervised learning of field segmentation models for information extraction
ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational LinguisticsThe applicability of many current information extraction techniques is severely limited by the need for supervised training data. We demonstrate that for certain field structured extraction tasks, such as classified advertisements and bibliographic ...





Comments