Not an issue - richer datasets #6

johndpope · 2021-07-25T05:03:35Z

are you familiar with this https://twitter.com/e08477/status/1418440857578098691?s=21 ?

I want to do cityscape shots. Are you familiar with any relevant datasets?
Can this repo help output higher quality images? Or does it help with the prompting?

mehdidc · 2021-07-26T01:43:23Z

Hi, I was not aware of these, these are very beautiful!
the repo is not meant to output higher quality images (quality should be the same as VQGAN-CLIP examples) or help with prompting, it is meant to do the same thing without needing an optimization loop for each prompt, and can also generalize to new unseen prompts in the training set. All you need is to collect/build a dataset of prompts and train the model with it, once it is done you can generate images with new prompts in a single step (so no optimization loop). I will shortly also upload pre-trained model(s) based on conceptual captions 12m prompts (https://github.com/google-research-datasets/conceptual-12m), if you would like to give it a try without re-training from scratch. Also, since you obtain a model at the end, additionally you can also interpolate between the generated images of different prompts. I hope the goal of the repo is clearer.

johndpope · 2021-07-26T01:48:01Z

"so no optimization loop" -
does that mean there's no 500x iterations to get a good looking image?

fyi - @nerdyrodent

mehdidc · 2021-07-26T01:48:38Z

" does that mean there's no 500x iterations to get a good looking image?" Yes

mehdidc · 2021-07-26T01:53:07Z

Following the tweet you mentioned above, here is an example with "deviantart, volcano": https://imgur.com/a/cYMsNo5 with a model currently being trained on conceptual captions 12m.

mehdidc · 2021-07-27T15:13:07Z

@johndpope I added a bunch of pre-trained models if you want to give it a try

johndpope · 2021-07-29T05:19:52Z

I had a play with the 1.7gb cc12m_32x1024 - I couldn't get my high quality that I was getting on VQGAN-CLIP - will keep trying - bumping the dimensions. Maybe docs could use some pointers - 256 x256 / 512x512 etc
One thing is clear - this can perform very quickly - perhaps efforts to have this provide a hot serving whereby you could give it a new prompt / running a service / almost in realtime without turning off the engine so to speak. We talk about FPS - frames per second - could we see a VQPS ???

Here's some images I turned out over the weekend -
nerdyrodent/VQGAN-CLIP#13

Observerations
When I threw in a parameter - it was clearly identifable.
Los Angeles | 35mm
Eg. https://twitter.com/johndpope/status/1419352229031518209/photo/1

Los Angeles Album Cover
https://twitter.com/johndpope/status/1419354082192412679/photo/1

This didn't quite cut it.
python -u main.py test pretrained_models/cc12m_32x1024/model.th "los angeles album cover"

Other improvements for newbies - you could consider integrating these downloads into readme
https://github.com/nerdyrodent/VQGAN-CLIP/blob/5edb6a133944ee735025b8a92f6432d6c5fbf5eb/download_models.sh

afiaka87 · 2021-07-29T17:17:40Z

@johndpope have you considered re-embedding the outputs from the trained vitgan as clip image-embeds; and then using those as prompts to a "normal" VQGAN-CLIP optimization with a much higher learning rate than usual and fewer steps? That will allow you to use non-square dimensions.

Also - one of the other primary benefits of this approach is that if you'd like to finetune from one of the checkpoints or even train your own from scratch - this can be relatively simple as all you need are some captions which can be generated/typed out. You'll want to cover a large-ish corpus but using something like the provided MIT states captions as a base should be a good start.

Thanks for the extra info. I'm a little busy today but I think the README might need one or two more things and possibly a colab notebook specific to training (if we don't have that already) that would make it easy to customize MIT states.

edit: realtime updates to your captions/display of rate of generations etc. may be outside of the scope of the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not an issue - richer datasets #6

Not an issue - richer datasets #6

johndpope commented Jul 25, 2021

mehdidc commented Jul 26, 2021 •

edited

Loading

johndpope commented Jul 26, 2021

mehdidc commented Jul 26, 2021

mehdidc commented Jul 26, 2021 •

edited

Loading

mehdidc commented Jul 27, 2021

johndpope commented Jul 29, 2021

afiaka87 commented Jul 29, 2021 •

edited

Loading

Not an issue - richer datasets #6

Not an issue - richer datasets #6

Comments

johndpope commented Jul 25, 2021

mehdidc commented Jul 26, 2021 • edited Loading

johndpope commented Jul 26, 2021

mehdidc commented Jul 26, 2021

mehdidc commented Jul 26, 2021 • edited Loading

mehdidc commented Jul 27, 2021

johndpope commented Jul 29, 2021

afiaka87 commented Jul 29, 2021 • edited Loading

mehdidc commented Jul 26, 2021 •

edited

Loading

mehdidc commented Jul 26, 2021 •

edited

Loading

afiaka87 commented Jul 29, 2021 •

edited

Loading