Using StyleGAN latent space regression to analyze and create image collages with GANs.

Header img


In this post, I will give a brief overview of the recent paper from MIT Using latent space regression to analyze and leverage compositionality in GANs.

❓What?

Given an incomplete image or a collage of images, generate a realistic image.

img

TL;DR: Train StyleGAN latent space regressor, embed crude collage of images, feed the latent vector to StyleGAN to get a realistic image.

📌How?

Using latent space regression to analyze and leverage compositionality in GANs.

  1. Train a regressor to predict StyleGAN latent code even from incomplete image
  2. Embedd collage and send it to GAN

This paper presents a simple approach – given a fixed pretrained generator (e.g., StyleGAN), they train a regressor network to predict the latent code from an input image. To teach the regressor to predict the latent code for images w/ missing pixels they mask random patches during training. Now, given an input collage, the regressor projects it into a reasonable location of the latent space, which then the generator maps onto the image manifold. Such an approach enables more localized editing of individual image parts compared to direct editing in the latent space

Interesting findings

  • Even though our regressor is never trained on unrealistic and incoherent collages, it projects the given image into a reasonable latent code.
  • Authors show that the representation of the generator is already compositional in the latent code. Meaning that altering the part of the input image, will result in a change of the regressed latent code in the corresponding location.

More results

img

img

img

  • Paint by Word. Recently, I wrote a blopost of the paper “Paint by Word”. This paper introduces an Image editing method where the user can paint a mask and specify any text description to guide the image generation in the masked region.

☑️ Conclusions

➕ Pros:

  • As input, we need only a single example of approximately how we want the generated image to look (can be a collage of different images).
  • Requires only one forward pass of the regressor and generator -> fast, unlike iterative optimization approaches that can require up to a minute to reconstruct an image. https://arxiv.org/abs/1911.11544
  • Does not require any labeled attributes.

Applications

  • Image inpainting.
  • Example-based image editing (incoherent collage -> to a realistic image) 🔥.

📎 References:

📝 Arxiv paper: arxiv.org/abs/2103.10426
🧿 Project page: chail.github.io/latent-composi…
⚒ GitHub: Code
📔 Colab: Link

🌐 Related blogpost: New DALL-E? Paint by Word


Feel free to ask me any questions in the comments below. Feedback is also very appreciated.

  • Join my telegram channel not to miss other novel paper reviews like this! @gradientdude
  • Follow me on twitter @artsiom_s