Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply JoJoGAN on car #27

Open
HOKINGLOK opened this issue Apr 4, 2022 · 7 comments
Open

Apply JoJoGAN on car #27

HOKINGLOK opened this issue Apr 4, 2022 · 7 comments

Comments

@HOKINGLOK
Copy link

Hi all, did anyone try to apply JoJoGAN on car images (or other kinds of images)? I tried to replace both the e4e pretrained weight file and StyleGAN2 pretrained weight file with the car-specific one and then finetuned the StyleGAN generator. But the result was not good. It seems that the generator was not finetuned at all...

@mchong6
Copy link
Owner

mchong6 commented Apr 4, 2022

can you share the results? One potential source for bugs is that face stylegan is generating at 1024 resolution while cars I believe is at 512. So you might have to change some variables to ensure the right model is loaded.

I have tried it on churches and the results are fine so I don't see why it wouldn't work on cars.

@HOKINGLOK
Copy link
Author

HOKINGLOK commented Apr 5, 2022

Thanks for your reply. Yes, I think the model and the corresponding weight is loaded without bug. Below is one of my result, the first one is inferencing the finetuned generator and the second one is putting the inversion latent code into the generator without finetune. It seems that the change between them is very slight and the direction of change seems wrong (only the background and ground are updating). I also notice that the loss while finetuning fluctuate a lot even though when I decrease the learning rate to 2e-4. Sometimes the loss decrease to a very low level very fast, which I think may be one reason that causes the generator was not finetuned correctly.
image
image

@mchong6
Copy link
Owner

mchong6 commented Apr 5, 2022

What do the figures mean? I assume the first image is your style reference, what about the others? What do the inversion of the style reference look like?

@HOKINGLOK
Copy link
Author

Yes, the first figure is the style reference, the second one is the test input and the last one is feeding the inversion code of the test input into the fine-tuned generator / original generator. I think we have solved the problem by adjusting discriminator's structure since there was some conflicts between the model and the weights file. But we also find that finetuning a car specific JoJoGAN is harder than a human face one (usually takes more iteration to get the same level of loss). Moreover, the content of the test input is often missed by the inference result. Is this to do with the styleGAN generator?

@mchong6
Copy link
Owner

mchong6 commented Apr 6, 2022

Ah that is right. The discriminator loss function assumes that the image is 1024, I forgot about that, good catch. It seems like the inversion is really bad in your case, not sure why. The inverted car looks nothing like the input car. Poor GAN inversion could be the reason why it takes longer to train and might get poorer results

@HOKINGLOK
Copy link
Author

Understood, thank you very much!

@dongyun-kim-arch
Copy link

@mchong6 HI! I am also trying to utilize face model with 512x512 size. But, it seems there are some problems I couldn't catch. Could you point which parts I should change to run 512x512 model? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants