MyStyle++: A Controllable Personalized Generative Prior

In this paper, we propose an approach to obtain a personalized generative prior with explicit control over a set of attributes. We build upon MyStyle, a recently introduced method, that tunes the weights of a pre-trained StyleGAN face generator on a few images of an individual. This system allows synthesizing, editing, and enhancing images of the target individual with high fidelity to their facial features. However, MyStyle does not demonstrate precise control over the attributes of the generated images. We propose to address this problem through a novel optimization system that organizes the latent space in addition to tuning the generator. Our key contribution is to formulate a loss that arranges the latent codes, corresponding to the input images, along a set of specific directions according to their attributes. We demonstrate that our approach, dubbed MyStyle++, is able to synthesize, edit, and enhance images of an individual with great control over the attributes, while preserving the unique facial characteristics of that individual.


INTRODUCTION
Ever since the introduction of generative adversarial networks (GAN) [Goodfellow et al. 2014], there has been a growing interest in unconditional image synthesis, which has led to a rapid improvement in resolution and quality of the images generated by GAN-based approaches.In particular, StyleGAN [Karras et al. 2021[Karras et al. , 2019[Karras et al. , 2020]], one of the most popular image generators, produces high-resolution results that are indistinguishable from real images.Built on the success of StyleGAN, a large number of methods [Abdal et al. 2021;Gal et al. 2022;Härkönen et al. 2020;Patashnik et al. 2021;Shoshan et al. 2021;Tov et al. 2021;Wang et al. 2021aWang et al. , 2022;;Wu et al. 2021] use it as a prior for semantic face editing and other image enhancement tasks, such as inpainting and super-resolution.However, the major problem with these approaches is that they use a general prior, trained on a large number of diverse identities.Therefore, their edited or enhanced images may not preserve the identity and key facial features of the target person.
The recent approach by Nitzan et al. [2022], coined MyStyle, addresses this issue by personalizing the generative prior for an individual of interest.Specifically, given a few images of a person, MyStyle first projects these images into the latent space of a pretrained StyleGAN to obtain a set of latent vectors, called anchors.It then tunes the generator by minimizing the error between the synthesized anchor images and their corresponding input images.Through this process, the generator becomes highly tuned to reconstruct the individual of interest with high fidelity in the specific regions in the latent space, covered by the anchors.MyStyle produces impressive results, preserving the identity and facial features of the target individual, for various tasks such as synthesis, semantic editing, and image enhancement.
Fig. 2. On the top, we show the editing results of MyStyle using expression direction from InterFaceGAN [Shen et al. 2020].Since the original direction does not reside within the personalized subspace, editing with this direction produces results with altered identity (rightmost image).By performing the edit using the projected direction, the identity is better preserved, but the expression becomes entangled with the yaw angle.Our method preserves the identity and keeps the other attributes intact while removing the smile.
However, this technique does not demonstrate precise control over the attributes of the generated images.For example, to synthesize an image with a particular set of attributes, one should randomly sample the convex hull of the anchor points until a desired image is reached by chance.For image editing, MyStyle uses the editing directions provided by approaches, such as InterFaceGAN [Shen et al. 2020], to offer controllability over the attributes of the generated images.Since these editing directions are learned over the entire domain, they may not reside within the personalized subspace.As shown in Fig. 2 (top), by performing the edits using the original direction, the latent codes will quickly fall outside the personalized subspace, producing images with a different identity.To address this issue, MyStyle personalizes the editing direction by projecting it into the subspace.While the projected edit direction keeps the latent codes within the personalized subspace, it loses the ability to perform disentangled edits.As shown in Fig. 2 (middle), removing the expression also results in changing the yaw angle.
Our goal is to address these problems by providing full control over a set of pre-defined attributes of the generated images.To this end, we make a key observation that anchors corresponding to a single person are usually clustered together in a small region within the latent space.Therefore, we can organize the latent space within that region by rearranging the anchors.Since it is easier for a generator, like StyleGAN, to preserve the smoothness of the output variation over the space of the latent space, rearranging the anchors causes the space in between to be dragged with them, resulting in an organized latent space.
Armed with this observation, we propose a novel optimization system to personalize a generative prior by both tuning the generator and organizing the latent space through optimizing the anchors.Our key contribution is to formulate a loss function that arranges the anchors with specific attributes along a particular direction in the latent space.Specifically, we project the anchors into a set of principal axes and minimize the variance of the projection for all the anchors with the same attribute.By doing so, the generator Fig. 3. Illustration of our data organization with two attributes  = 2, yaw and expression.We quantize the range of continuous attributes to obtain a set of discrete levels  , across all attributes.The estimated attributes for each image are then assigned to their nearest discrete level.
becomes highly tuned to one individual, while the attributes can be controlled within a small hypercube in the latent space.
We demonstrate that our proposed method, called MyStyle++, allows synthesizing images with high fidelity to the characteristics of one individual, while providing full control over a set of pre-defined attributes.We also show that our method can better disentangle different attributes compared to MyStyle [Nitzan et al. 2022].Moreover, we demonstrate that our system can produce images with a desired attribute during image enhancement.

RELATED WORK 2.1 Deep Generative Networks
Generative Adversarial Networks (GANs) consist of two main modules: a generator and a discriminator [Goodfellow et al. 2014].The generator takes a noise vector as input and tries to capture the distribution of true examples.The generator focuses on producing an output that fools the discriminator, whose purpose is to classify whether the output is real or fake.GANs have been used extensively to synthesize images that are in line with the training data distribution [Brock et al. 2019;Karras et al. 2018Karras et al. , 2019;;Zhu et al. 2017].Among different variants, StyleGAN [Karras et al. 2021[Karras et al. , 2019[Karras et al. , 2020]], which is a carefully re-designed generator architecture, produces the results that are indistinguishable from real photographs, particularly for human faces.In our work, we use StyleGAN2 [Karras et al. 2020] as the base network and personalize it by tuning the generator and organizing the latent space.

Controllable GANs
StyleGAN generates photorealistic portrait images of faces, but it lacks control over semantic face parameters, such as face pose, expressions, and scene illumination.Recently, many StyleGAN variants [Abdal et al. 2021;Gal et al. 2022;Härkönen et al. 2020;Patashnik et al. 2021;Shoshan et al. 2021;Tov et al. 2021;Wang et al. 2021aWang et al. , 2022;;Wu et al. 2021] have been introduced to address this problem.For example, StyleFlow [Abdal et al. 2021] proposes flow models for non-linear exploration of a StyleGAN latent space.
GANSpace [Härkönen et al. 2020] attempts to analyze the GAN space by identifying latent directions based on principal component analysis (PCA), applied either in latent space or feature space.
Most controllable portrait image generation methods [BR et al. 2021;Ji et al. 2022;Sun et al. 2022;Tewari et al. 2020a,b;Wang et al. 2021b;Zhou et al. 2019] either rely on 3D morphable face models (3DMMs) [Blanz and Vetter 1999] to achieve rig-like control over StyleGAN, or utilize another modality as guidance (e.g., facial landmark and audio) to control the generation.For instance, by building a bijective mapping between the StyleGAN latent code and the 3DMM parameter sets, StyleRig [Tewari et al. 2020b] achieves the controllable parametric nature of existing morphable face models and the high photorealism of generative face models.Ji et al. [2022] propose an approach to generate one-shot emotional talking faces controlled by an emotion source video and an audio clip.
To explicitly control the camera, several algorithms propose generative neural radiance fields [Chan et al. 2021;Deng et al. 2022] to produce 3D images.Others [Bergman et al. 2022;Simsar et al. 2022] extend these ideas by providing the ability to control other attributes of 3D GANs.
Unfortunately, all the approaches discussed in this section either struggle to retain crucial facial features (identity) after editing [Nitzan et al. 2022] or are unable to maintain explicit control over fully disentangled attributes.

Few-shot GANs and Personalization
Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, many works [Chen et al. 2021;Liu et al. 2019;Nitzan et al. 2022;Ojha et al. 2021;Wang et al. 2019;Zakharov et al. 2019] seek to further improve the generation quality by adapting the pre-trained model to few-shot image samples.Zakharov et al. [2019] propose a framework that performs lengthy meta-learning on a large dataset of videos.After this training, this method is able to frame few-and one-shot learning of neural talking head models of previously unseen people as adversarial training problems with high capacity generators and discriminators.The appearance information of the unseen target person is learned by the adaptive instance normalization layers.More recently, MyStyle [Nitzan et al. 2022] tunes the weights of a pre-trained StyleGAN face generator to form a local, low-dimensional, personalized manifold in the latent space within a small reference set of portrait images of the target person.The synthesized images within the adapted personalized latent space have better identity-preserving ability compared with the original StyleGAN.However, MyStyle does not demonstrate precise control of the attributes of the generated images.We focus on addressing this issue by organizing the personalized subspace according to a set of pre-defined attributes.

ALGORITHM
Given a few images of an individual with a set of corresponding attributes, our goal is to obtain a personalized generative prior that allows us to synthesize images of that individual with high fidelity and full control over the desired attributes.Specifically, we use the pre-trained StyleGAN [Karras et al. 2019[Karras et al. , 2020] ] face generator Generator Personalization Personalized Generator Fig. 4. On the top, we show an overview of our controllable personalization approach.Given a set of input images of an individual with their corresponding attributes, we first encode them into W space of StyleGAN to obtain a set of anchors, shown with circles.Note that the colors indicate the attributes of the images.In this case, different views are indicated with yellow, orange, and red, while the colors for different expressions are green and blue.We then minimize an objective function consisting of anchor L anc and reconstruction L rec losses to organize the latent space, by updating the anchors, while tuning the generator.After optimization, we obtain an organized latent space W * that can be easily sampled according to a set of attributes, and a tuned generator that can produce images that are faithful to the facial characteristics of the target individual.
and adapt it to the target individual through a novel optimization system.During tuning, we organize the latent space by optimizing the anchors according to the attributes to be able to easily sample an image with a desired set of attributes.Additionally, we optimize the generator to ensure it can produce images that are faithful to the characteristics of the target individual.Below we discuss our approach in detail by first explaining our data pre-processing.

Data Pre-processing
Given a set of  images of an individual, we first follow the preprocessing steps of MyStyle [Nitzan et al. 2022] to align, crop, and resize the images.We then estimate a set of  pre-defined attributes (e.g., yaw and expression) for each image.Certain attributes have a discrete domain, while others are continuous.We leave the discrete attributes unchanged, but quantize the range of the continuous ones to obtain  , , where  refers to the attribute type  ∈ {1, • • • ,  }, while  ∈ {1, • • • ,  ()} is the index of the attribute value.Note that the number of quantization levels  () could be different for each attribute .The estimated attributes for each image are then snapped to the nearest quantized values.A simple example illustrating this process is shown in Fig. 3.We provide more details on the attributes and our quantization strategy in Sec. 4.

Controllable Personalization
We begin by projecting the input images into the latent space of StyleGAN, using the pre-trained encoder by Richardson et al. [2021], to obtain a set of  latent codes {w  }  =1 .We follow MyStyle [Nitzan et al. 2022] terminology and call these latent codes, anchors.As discussed, in addition to tuning the generator to improve its fidelity to the target individual, we would like to organize the latent space to have full control over a set of attributes.An overview of our approach is shown in Fig. 4.
Our key observation is that we can organize the latent space by only rearranging the anchors.This is because the output of StyleGAN changes smoothly with respect to the input, and thus as an anchor moves, its neighborhood will be dragged with it.Based on this observation, we formulate an anchor loss to rearrange the anchors based on their attributes.
Before explaining our anchor loss in detail, we discuss the properties of an ideal latent space: 1) Each attribute should change along a known direction; d  for the  th attribute.This is to ensure we can perform semantic editing and change a particular attribute by simply modifying a latent code along that attribute's direction.2) All the latent codes that project to the same value along an attribute direction should have the same attribute.For example, all the latent codes that project to 0.5 along the yaw direction should correspond to images of front faces.This allows us to directly sample an image with a certain set of attributes by ensuring that the latent code projects to appropriate values along each attribute direction.3) The directions for different attributes should be orthogonal to guarantee that the attributes are fully disentangled and changing one will not result in modifying the other attributes.
We propose to codify the three properties into the following anchor loss: Here, w  • d  computes the projection of the anchor for the  th image onto the direction of  th attribute through dot product.Moreover,  , is the average of the projected anchors into direction d  for all the images with the same  th attribute as the  th image (subset denoted as N , ).Formally, we can write this as follows: where Here,   (I  ) [] returns the quantized  th attribute of image I  .We note that  , changes at every iteration of the optimization.By minimizing the loss in Eq. 1, we ensure that all the anchors with the same  th attribute, project to the same point along  th attribute direction d  , satisfying our second desired property.This loss also ensures that each attribute is changed along its specific direction, satisfying the first property.This can be seen visually in Fig. 3; for example, if all the images with a specific yaw (each column) project to the same point in the yaw direction, moving along this direction will change the yaw.
To satisfy the third property, we apply principle component analysis (PCA) to all the  anchors and use a subset of the principle components as our d  .We assign a specific principal component to each d  through the following objective: where V is the set of all the principle components and L  is defined in Eq. 1.The intuition behind this is that we would like to perform the least amount of rearrangement by ensuring that the latent space is already well aligned with respect to the selected directions.Note that we perform PCA at every iteration of training.Therefore, as we rearrange the anchor points in different iterations, the directions will be updated as well.We also note that although the objective in Eq. 4 could potentially assign different principle components to a particular attribute direction d  in different iterations, we did not observe this phenomenon in our experiments.
To perform personalization, we minimize the combination of the anchor and reconstruction losses where the reconstruction loss L rec minimizes the error between between the synthesized  (w  ) and the corresponding input images I  .We follow MyStyle and use a combination of LPIPS [Zhang et al. 2018] and L2 as our reconstruction loss.During optimization, both the latent codes corresponding to the anchors and the weights of the generator are updated.Note that in addition to adapting the generator to the input image set, the reconstruction loss plays a critical role in avoiding trivial solutions to the anchor loss, e.g., collapsing all the anchors to a single point.
Once the optimization is performed, we obtain an organized latent space W * and tuned generator  * .All the attributes can be controlled within an -dimensional hypercube in the organized latent space.The bounds of this hypercube can simply be found by projecting all the anchors into each axis of the hypercube d  and computing the minimum and maximum values.Note that all the other attributes, not being used during optimization, are encoded in the remaining PCA dimensions.

Controllable Synthesis, Edit, and Enhancement
We now describe how to use our personalized generative prior for various tasks.
Synthesis: Controlling the synthesized images can easily be done by ensuring that the sampled latent code projects to the desired location in the -dimensional hypercube.However, special care must be taken to ensure the latent code does not fall outside of the personalized space.Following MyStyle, we define the convex hull of all the organized anchors w *  as the personalized subspace within W * .This convex hull is represented through generalized barycentric coordinate as the weighted sum of the anchors, where the weights (coordinates)  = {  }  =1 sum up to 1 and are greater than − ( is a positive value).The latter condition dilates the space by a small amount to ensure expressiveness.
We propose a simple strategy to perform controlled sampling in the dilated convex hull.Specifically, we first randomly sample  to ensure the latent code is within the personalized subspace.We then project the sampled latent code into PCA and set the projected values along the attribute directions d  to the desired values.Note that, while it is possible for the modified latent codes to fall outside the dilated convex hull and require reprojection to the personalized space, we did not observe such cases in practice.This is mainly because our latent space is organized according to the attributes and our modifications are performed inside a hypercube which is part of the subspace.
Semantic Editing: Since our latent space is organized, the editing process for sampled images is straightforward.To edit an image, we project its latent code to PCA and perform the edit by changing the coordinate in the hypercube.To edit a real image I, we first project the image into the  space through the following objective: where W * is a matrix with organized anchors along its columns.Note that we follow MyStyle's approach to ensure  values satisfy the conditions of the dilated convex hull, i.e., they sum up to 1 and are greater than −.Once we obtain the optimized latent code, following Roich et al. [2022], we further tune the generator to better match the input image.We then perform the semantic edits, by changing the latent code in PCA.
Image Enhancement: Given an input image  with a known degradation function , our goal is to enhance the image, while controlling the attributes of the reconstructed image.We propose to do this through the following objective: where  controls the balance between the two terms and we set it to one in our implementation.Here, the first term ensures that the generated image, after applying the degradation function, is similar to the input image.The second term encourages the projection of the latent code W *  onto the  th attribute direction to be similar to the desired value   .Note that, we can perform enhancement by controlling a subset of the attributes, by only applying the second term to the attributes of interest.Similarly, for uncontrolled enhancement, we simply remove the second term.

RESULTS
We implement the proposed approach in PyTorch and adopt ADAM optimizer [Kingma and Ba 2015] with the default parameters.All the results are obtained after tuning a pre-trained StyleGAN2 [Karras et al. 2020] generator on FFHQ [Karras et al. 2019] dataset.We perform the tuning for 3000 epochs with a batch size of one and a learning rate of 5e-3 across all datasets.
We have tested our system on the following individuals: Barack Obama (93 images), Emma Watson (304 images), Joe Biden (142 images), Leonardo DiCaprio (217 images), Michelle Obama (138 images), Oprah Winfrey (106 images), Scarlett Johansson (179 images), and Taylor Swift (129 images).We consider the expression, as well as yaw and pitch angles as the attributes for all individuals.For Leonardo Dicaprio and Emma Watson, we include age in addition to the other three attributes.Throughout this section, we show our results on some of these individuals, but more results can be found in the supplementary materials.
We estimate the expression, yaw, and pitch by leveraging AWS Rekognition API [Amazon 2023], while we employ the DEX VGG network [Rothe et al. 2018] to estimate the age attribute.We quantize yaw and pitch angles by every 5 degrees and age by every 2 years during the data pre-processing stage, described in Sec.3.1.For expression, we utilize a combination of the "Smile" and "MouthOpen" attributes of the AWS output, which indicates the presence of the attribute as true or false with a confidence level ranging from 50 to 100.We divide the confidence level by 20% and round it down to the nearest integer, resulting in three groups of presence and three groups of absence for each attribute.We then combine the lowest groups of presence and absence (presence and absence with 50% to 60% confidence) into the same group, resulting in five quantization levels for both "Smile" and "MouthOpen".The images with the same "Smile" and "MouthOpen" quantization levels are then grouped together.
We compare our approach against two versions of MyStyle, called MyStyle_I and MyStyle_P, where the editing directions are obtained from InterFaceGAN [Shen et al. 2020] and PCA (using Eq. 4), respectively.Note that in MyStyle_P we do not organize the latent space and only tune the generator, i.e., minimize the reconstruction loss, but not the anchor loss.Although MyStyle does not demonstrate controllable synthesis, we use the approach discussed in Sec.3.3 with the directions from InterFaceGAN and PCA to imbue MyStyle with this capability.
Here, we show a subset of our results, but more comparisons and evaluations can be found in our accompanying video and supplementary materials.
Synthesis: We begin by comparing our controllable synthesis results for Oprah Winfrey, Barack Obama, Scarlett Johansson, and Leonardo DiCaprio against MyStyle_I and MyStyle_P.For each person, we show a set of results by fixing one attribute and randomly sampling the rest.As shown in Fig. 5, both MyStyle_I and MyStyle_P produce results with large variations in the attribute of interest, because the directions from InterFaceGAN [Shen et al. 2020] and PCA do not match the correct attribute directions in the personalized subspace.For example, on the top, a large smile is expected, whereas images generated by MyStyle_I and MyStyle_P exhibit a range of different expressions.While yaw is usually the dominant attribute in the latent space and relatively easy to control, MyStyle_I and MyStyle_P exhibit undesirable yaw variance for Barack Obama.Similarly, these baselines produce results with large pitch and age variations for Scarlett Johansson and Leonardo DiCaprio, respectively.In contrast, our approach produces results that are consistent in all four cases.Note that InterFaceGAN does not provide a direction corresponding to the pitch, and thus we only compare against MyStyle_P for the case with fixed pitch.
Table 1.We numerically compare our controlled synthesis results against MyStyle_P and MyStyle_I.We generate 100 images for each fixed attribute value and report the standard deviation of the estimated attribute of interest over the generated images.Note that the attribute values (e.g., 0.25) are in the normalized coordinate   .The best results are shown in bold.Note that we do not report any fixed pitch synthesis results for MyStyle_I as InterFaceGAN [Shen et al. 2020] does not provide an edit direction for Pitch.We further numerically evaluate the ability of our method to control the attributes in comparison with MyStyle_P and MyStyle_I in Table 1.To accomplish this, we generate 100 images by fixing one attribute and randomly sampling the other ones.We then estimate the attributes of the generated images, using AWS Rekognition for expression, as well as the yaw and pitch angles, and DEX VGG [Rothe et al. 2018] for age, and compute the standard deviation of the estimated attribute for all the 100 images.For each attribute, we show the results for five normalized values (0.0, 0.25, 0.5, 0.75, 1.0).As seen, MyStyle_P and MyStyle_I generate inferior results as the PCA and InterFaceGAN attribute directions are not well-aligned with the correct attribute directions in the subspace.In contrast, our approach consistently demonstrates the smallest standard deviation across all attributes for both Scarlett Johansson and Leonardo DiCaprio.

Scarlett
A potential concern is whether our latent space organization could compromise the diversity and preservation of the identity of the results.To numerically evaluate this, we compute the ID metric, as proposed in MyStyle [Nitzan et al. 2022], on the results generated by both our approach and MyStyle for Scarlette Johansson and Leonardo DiCaprio.This metric measures the cosine similarity of the features extracted by a deep classifier between the generated image and the closest one from the training data.Besides measuring Table 2.We compare our results against MyStyle in terms of the ID metric [Nitzan et al. 2022] and diversity score [Ojha et al. 2021].Higher numbers are better.Our method produces similar results compared to MyStyle, which demonstrates that latent organization does not hurt the quality of our results.The best results are shown in bold.the ability to preserve the identity, we also compute the diversity of the synthesized images.We follow the protocol suggested by Ojha et al. [2021] to computer the intra-cluster diversity using the LPIPS score.Specifically, we generate 1000 images and assign them to one of the 10 training images, by using the lowest LPIPS distance.

Scarlett Johansson
Then we compute the average pair-wise LPIPS distance within members of the same cluster and then average over the 10 clusters.As shown in Table 2, our method generates results that are comparable to MyStyle in terms of ID metric and diversity score, demonstrating that our latent space organization does not compromise the diversity and identity preservation of the results.Semantic Editing: We begin by comparing our semantic editing results against MyStyle_P and MyStyle_I in Fig. 6.Specifically, we modify the expression, yaw, pitch, and age of Scarlett Johansson, Michelle Obama, Joe Biden, and Leonardo DiCaprio, respectively.MyStyle_P has difficulties editing Scarlett Johansson's expression and predominantly changes the yaw.While MyStyle_I is better able to edit the expression, it slightly changes the yaw (see the supplementary video) and produces a neutral face with altered identity (the leftmost image).Moreover, both MyStyle_P and MyStyle_I change the expression when editing Michelle Obama's yaw angle.For Joe Biden, MyStyle_P struggles to properly edit the pitch angle as the PCA direction is not well-aligned with the pitch attribute direction in the subspace.Finally, when editing the age of Leonardo DiCaprio, both MyStyle_P and MyStyle_I exhibit noticeable changes to the expression and pitch, respectively.Additionally, both approaches struggle to preserve the identity of the edited images in extreme cases (rightmost for MyStyle_P and leftmost for MyStyle_I).In contrast to these techniques, our method only changes the attribute of interest when producing edited results and is able to better preserve the identity.Again, we note that we do not show pitch editing for MyStyle_I as InterFaceGAN does not provide a direction corresponding to the pitch attribute.
Next, we compare our method against the other techniques for editing real images of Barack Obama, Emma Watson, Scarlett Johansson, and Leonardo DiCaprio, in Fig. 7.Both MyStyle_P and MyStyle_I have difficulties preserving the identity of Barack Obama when removing the smile.Additionally, MyStyle_P struggles to maintain the yaw angle.For Emma Watson, both MyStyle_P and MyStyle_I change the expression when editing the yaw angle.For Scarlett Johansson, MyStyle_P is unable to edit the pitch and instead modifies the yaw angle.Finally, MyStyle_P changes the yaw angle when editing Leonardo DiCaprio's age, while MyStyle_I has difficulties maintaining the identity.In contrast to these methods, our approach disentangles the attributes more effectively and is better at preserving the identities in all four cases.
We note that the reason behind MyStyle's occasional failure to preserve the identity is that the edited latent codes, in some cases, fall outside the personalized subspace.While the loss of identity can be resolved by projecting the edited latent codes back to the convex hull, using MyStyle's suggested strategy, this process produces results with undesirable attributes.This is shown in Fig. 8 where the objective is to completely remove Barack Obama's smile and produce a teenage Leonardo DiCaprio.MyStyle_I produces results with altered identities as evident both visually and numerically through the ID metric.The identity is improved by projecting the edited latent codes to the subspace (third column), but this process increases the smile (top) and age (bottom).
We further numerically compare our real image editing results against MyStyle_P and MyStyle_I on Leonardo DiCaprio in Table 3. Specifically, we evaluate the editing consistency by computing the mean standard deviation of the edited attribute, while we measure the attribute disentanglement by calculating the mean standard deviation of the unedited attributes.The standard deviations are computed over 21 edits and they are averaged over 21 images.We additionally evaluate the ability of different methods to preserve the identity using the ID metric.As seen, our method consistently outperforms MyStyle_P and MyStyle_I across all metrics.
Image Enhancement: As discussed in Sec.3.3, since our method provides precise control over the attributes, it can be used to perform controllable image enhancement.This is shown in Figs. 9 and 10 for image inpainting and super-resolution, respectively.As seen our method can produce inpainted and super-resolved images with the desired expressions.
Analysis: We begin by evaluating the effect of number of images on the quality Yaw-editing for Scarlett Johansson in Table 3.As seen, our system works reasonably well with 92 images, but the editing

LIMITATIONS AND FUTURE WORK
Our approach is able to produce high-quality results with great control over a set of attributes.However, it has a few limitations.
First, the number of images required for personalization increases significantly with the number of desired attributes.This is because we rely on the propagation of the anchors to the neighboring regions.If the anchors in certain regions are sparse, those areas are not going to be personalized appropriately.However, this is not unique to our approach and MyStyle suffers from the same drawback.For example, if MyStyle is personalized with images of a young subject, it cannot produce images of the subject at an old age with high fidelity.Second, while our approach provides great control over the attributes, our reconstructions for attributes like view are not physically accurate.In the future, it would be interesting to incorporate the image formation process into our system to improve accuracy.We note that although our approach has the potential to be applied to cases beyond MyStyle, such as organizing the entire latent space of StyleGAN, one significant challenge arises: organizing the entire latent space necessitates a large number of anchor images, resulting in time-consuming and difficult optimization.Furthermore, special attention must be given to prevent anchors with different identities from being placed closely together after optimization; this is not an issue when handling a single individual.

CONCLUSION
We have presented an approach to obtain a controllable personalized generative prior from a set of images of an individual.Our system allows for reconstructing images of the individual that faithfully preserve the key facial features of the person, while providing full control over a set of pre-defined attributes.In addition to tuning a pre-trained generator, we organize its latent space such that different attributes change along certain known directions.To do this, we formulate a loss that rearranges the latent codes, corresponding to the input images, according to the attributes.We show that our method better disentangles the attributes than MyStyle, while providing full control over the attributes.

Fig. 5 .Fig. 6 .
Fig. 5.We present a comparison of our synthesis results with those of MyStyle_I and MyStyle_P for four individuals: Oprah Winfrey, Barack Obama, Scarlett Johansson, and Leonardo DiCaprio.Our method consistently produces attribute-controlled image synthesis, whereas MyStyle_I and MyStyle_P exhibit significant inaccuracies in attribute controllability.

Table 3 .
We compare our editing results against MyStyle_I and MyStyle_P in terms of the mean standard deviation of the edited attribute to show editing consistency (marked with * ), and of fixed attributes to demonstrate attribute disentanglement.We additionally report the ID metric to evaluate identity preservation ability.The best results are shown in bold.

Table 4 .
Evaluating the effect of the number images on the quality of Yawediting for Scarlett Johansson in a manner akin to Table3.