Stylegan3 + CLIP - Some Tests [en/spa]

A few days ago Stylegan3 was released to the public. Less than 24h later, nshepperd, integrated CLIP into Stylegan3 in this notebook, pleasing many of us that where willing to see it becoming real. I have been doing some tests, and this video above is one example of them. It was funny that the AI recognized who is Xena, and it made a decent work with it. I'll mention several AI artist around, that might be interested on this.

Hace pocos días Stylegan3 pasó a estar disponible para el público. Menos de 24h más tarde, nshepperd, integró CLIP y Stylegan3 en este notebook, complaciendo a unos cuantos de nosotros que esperábamos con muchas ganas ver esto hacerse realidad. He estado haciendo varios test, y este video es uno de ellos. Fue divertido que la IA reconozca quien es Xena, y que hiciera un buen trabajo con ella. Mencionaré algunos artistas de la IA que hay por aquí, creo que esto les puede interesar.

_{cc: @castleberry @dbddv01 @lavista @juecoree @kaliyuga @eve66 @kitzune}

Testing Stylegan3+CLIP

AIs are evolving really quick, and Stylegan3 is one of the best examples. It's amazing what it does. I was missing some kind of control or interaction with it, so as I said above, felt very hyped when found this notebook, with a rudimentary implementation of CLIP. I shared the link to the original notebook, instead host a copy as I usually do. That's because I expect it gonna get many updates from its creator. When the creator will be done with it, I'll host a copy and do little improvements if need.

I ran several tests, wanted to know how the AI reacts to several text prompts, and target images as well. Here you have to choose between text or images, unlike the more evolved VQGAN and Diffusion notebooks around.

About the text prompt, is CLIP, so it works as usual. But a couple of things needs to be taken in consideration. Stlygan3 is focused in faces... so any prompt given should be keeping that in mind, although CLIP can understand whatever thing we give to it, Stylegan3 only knows how to make faces. Also, I have noticed that the AI doesn't seems to know a long collection of artists, that worked perfectly with VQGANs and diffusion models. According of what I read somewhere, it seems that this notebook is using a modified imagenet dataset, that's doesn't include everything as the original one. So I guess that is the reason for the issue I'm talking about.

About work with target images there is not much to say, the AI does a very decent work with them, more than any other AI. Of course, need to keep in mind that this AI only does faces properly. I made some tests using images from my previous portraits, and don't feel deceived.

I would like to remark how optimized is this notebook. Stylegan works with pretrained models, so to make images of 720x720 it just only consumed 5Gb of the GPU RAM. Just amazing!

For what I observed, this notebook follows two steps. The first one is just generate, almost instantly, 32 outputs and choose the one that matches better with the prompt. Then, the second one, so the AI starts to transform it to try to bring you the best result.

I have added ten results from my experiments, each one has notes/prompts below. They are as they came from the AI. Didn't want to edit them, as the goal of this post is who what the AI can do. Hope you like them, or at least they are useful as guidance. Regards, and see you in my next post!

Las IA's están evolucionando muy rápido, y Stylegan3 es uno de los mejores ejemplos. Es increíble lo que hace. Estaba echando de menos alguna forma de controlar o interactuar con ella, y cómo dije más arriba me sentí muy entusiasmado cuando encontré este notebook, con una implementación de CLIP bastante rudimentaria. He compartido el enlace al notebook original, al contrario de una copia como hago habitualmente. Esto es por que espero que el creador lo vaya actualizando. Cuando ya esté acabado, alojaré una copia y le iré haciendo pequeños mejoras, si es necesario.

Hice varios test, ya que quería saber como la IA reacciona a prompts distintos y también imágenes objetivo. hay que elegir entre texto o imágenes, al contrario que en otros notebooks que trabajan con VQGAN y Diffusion, que están más desarrollados.

Acerca de usar textos, es CLIP, así que funciona como siempre. Pero hay que tener un par de cosas en cuenta. Stylegan3 es para hacer caras... así que cualquier prompt debe ser pensado teniendo en cuenta eso, ya que aunque CLIP entiende cualquier cosa que le digas, Stylegan3 sólo sabe hacer caras. También me he dado cuenta, de que no reconoce una larga lista de artistas. Por lo que he leído en algunos sitios, parece que este notebook esta usando un versión modificada de imagenet dataset, que no incluye todo lo que trae el original. Así que supongo que esta es la razón de que suceda lo que he dicho.

No tengo mucho que decir acerca del trabajo que la IA hace con imágenes objetivo, el resultado es más que decente, mucho más que el de otras IAs. Por supuesto, hay que recordar que esta IA sólo sabe hacer caras de una forma apropiada. Hice varios test con imágenes de mis anteriores retratos, y no me siento decepcionado.

Quiero remarcar lo bien optimizado que está este notebook. Stylegan usa modelos pre entrenados, así que para hacer imágenes de 720x720, sólo ha consumido 5Gb de RAM de la GPU. Maravilloso!

Por lo que he podido ver, este notebook sigue dos pasos. El primero es generar, casi instantáneamente, 32 resultados y escoger el que se ajusta más al prompt. El segundo paso, es transformar dicha imagen, intentando dar un resultado que se ajuste lo más posible a lo que hemos pedido..

He añadido diez resultados de mis experimentos, cada uno de ellos con notas/prompts debajo. Son tal cual la AI los ha creado. No he querido editarlos, pues la misión de esta publicación es mostrar lo que la IA puede hacer. Espero que os gusten, o que al menos sean útiles como guía. Saludos,¡y nos vemos en mi siguiente publicación!.

Notebooks

_{Enhanced Original VQGAN+CLIP Notebook}

_{Mse regularized Modified VQGAN+CLIP Notebook}

_{VQGAN+CLIP with Video Features Notebook}

_{Real ESRGAN 4x Notebook}

AI Artworks

descarga (12).png

_{This is a "normal" face created in the first step.}
_{Esto es una cara "normal" de las que son creadas en el primer paso.}

_{"A guy rendered in unreal engine" is the prompt for this one.}
_{"A guy rendered in unreal engine" es el prompt para esta imagen.}

_{"A dark priestess" is the one I used for this.}
_{"A dark priestess" es el que he usado para esta.}

_{Let's try to mix the two prompts above, "A dark priestess rendered in unreal engine".}
_{Probemos a mezclar los dos prompts que hemos usado antes, "A dark priestess rendered in unreal engine".}

_{Here I used "A pilgrim".}
_{Aquí he usado "A pilgrim".}

descarga (5).png

_{And here used "A woman by Francisco de Goya".}
_{Y aquí he usado "A woman by Francisco de Goya".}

Xena, the Warrior Princess.png

_{What about "Xena, the Warrior Princess"? You already met her in the header of this post.}
_{Qué tal "Xena, the Warrior Princess"? Ya la habéis conocido en la cabecera de esta publicación..}

descarga (11).png

_{And to finish, some images where I used some of my previous portraits as target images.}
_{Y para acabar, algunas imágenes que he creado usando retratos que tenía hechos como imágenes objetivo.}

_{All the content of this post is from my own.}
_{Todo el contenido de este post es de mi autoría.}

_{Images generated with Stylegan3+CLIP, served without anything else.}
_{Imágenes generadas Stylegan3+CLIP, servidas sin guarnición.}

_{100% AI free writing.}
_{Textos 100% libres de IA.}