Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD3.5-large (8B) support #442

Closed
stduhpf opened this issue Oct 22, 2024 · 12 comments
Closed

SD3.5-large (8B) support #442

stduhpf opened this issue Oct 22, 2024 · 12 comments

Comments

@stduhpf
Copy link
Contributor

stduhpf commented Oct 22, 2024

Stable Diffusion 3.5 Large and Large Turbo just got released publicly.
https://huggingface.co/stabilityai/stable-diffusion-3.5-large
https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo

Inference code here (warning: weird licence): https://github.com/Stability-AI/sd3.5

It's a model that should perform fairly well (SD3-Large is ranked slightly above Flux Schnell on artificialanalysis areana leaderboard, and this is an upgraded version of SD3-Large), while being smaller than Flux (it's has 8B parameters).

Right now, these two models are not supported by sdcpp (I tried).

What's required:

Sidenote: SD3.5 Medium (2B) is also going to be released soon, hopefully it will work as a drop-in replacement for SD3 2B

Edit: About quantization, the majority of tensors in sd3.5 large do not fit nicely in a whole number of blocks of size 256, so they are skipped when trying to quantize to q3_k, q4_k and so on.

@stduhpf stduhpf changed the title SD3.5-large support SD3.5-large (8B) support Oct 22, 2024
@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 23, 2024

I noticed there are some slight differences between sd3 and sd3.5 architecture diagrams. Not sure if this can cause problems.

SD3 (2B) SD3.5 (8B)
mmdit mmdit

Text embeddings are now 77+77/256 tokens instead of 77+77 tokens (not sure what "/256" means here, it's probably not a division)
And the RMS norm before attention in the DiT block is no longer optionnal.

@leejet
Copy link
Owner

leejet commented Oct 24, 2024

It's currently supported, see #445.

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 24, 2024

@razvanab it should work. I don't see a reason why they wouldn't be supported.

@stduhpf stduhpf closed this as completed Oct 24, 2024
@razvanab
Copy link

It does nothing; it just goes to the cmd prompt again.

➜ .\sd.exe -m  "J:\LLM_MODELS\SD\sd3.5_large_turbo-Q5_0.gguf" --clip_l "J:\LLM_MODELS\SD\clip\clip_l.safetensors" --clip_g "J:\LLM_MODELS\SD\clip\clip_vision_g.safetensors" --t5xxl "J:\LLM_MODELS\SD\clip\t5-v1_1-xxl-encoder-Q5_K_M.gguf"  -H 1024 -W 1024 -p "a lovely cat " --cfg-scale 4.5 --sampling-method euler --verbose
Option:
    n_threads:         8
    mode:              txt2img
    model_path:        J:\LLM_MODELS\SD\sd3.5_large_turbo-Q5_0.gguf
    wtype:             unspecified
    clip_l_path:       J:\LLM_MODELS\SD\clip\clip_l.safetensors
    clip_g_path:       J:\LLM_MODELS\SD\clip\clip_vision_g.safetensors
    t5xxl_path:        J:\LLM_MODELS\SD\clip\t5-v1_1-xxl-encoder-Q5_K_M.gguf
    diffusion_model_path:
    vae_path:
    taesd_path:
    esrgan_path:
    controlnet_path:
    embeddings_path:
    stacked_id_embeddings_path:
    input_id_images_path:
    style ratio:       20.00
    normalize input image :  false
    output_path:       output.png
    init_img:
    control_image:
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    strength(control): 0.90
    prompt:            a lovely cat
    negative_prompt:
    min_cfg:           1.00
    cfg_scale:         4.50
    guidance:          3.50
    clip_skip:         -1
    width:             1024
    height:            1024
    sample_method:     euler
    schedule:          default
    sample_steps:      20
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info:
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:159  - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1060 6GB, compute capability 6.1, VMM: yes
[INFO ] stable-diffusion.cpp:197  - loading model from 'J:\LLM_MODELS\SD\sd3.5_large_turbo-Q5_0.gguf'
[INFO ] model.cpp:801  - load J:\LLM_MODELS\SD\sd3.5_large_turbo-Q5_0.gguf using gguf format
[DEBUG] model.cpp:818  - init from 'J:\LLM_MODELS\SD\sd3.5_large_turbo-Q5_0.gguf'
[INFO ] stable-diffusion.cpp:204  - loading clip_l from 'J:\LLM_MODELS\SD\clip\clip_l.safetensors'
[INFO ] model.cpp:804  - load J:\LLM_MODELS\SD\clip\clip_l.safetensors using safetensors format
[DEBUG] model.cpp:872  - init from 'J:\LLM_MODELS\SD\clip\clip_l.safetensors'
[INFO ] stable-diffusion.cpp:211  - loading clip_g from 'J:\LLM_MODELS\SD\clip\clip_vision_g.safetensors'
[INFO ] model.cpp:804  - load J:\LLM_MODELS\SD\clip\clip_vision_g.safetensors using safetensors format
[DEBUG] model.cpp:872  - init from 'J:\LLM_MODELS\SD\clip\clip_vision_g.safetensors'
[INFO ] stable-diffusion.cpp:218  - loading t5xxl from 'J:\LLM_MODELS\SD\clip\t5-v1_1-xxl-encoder-Q5_K_M.gguf'
[INFO ] model.cpp:801  - load J:\LLM_MODELS\SD\clip\t5-v1_1-xxl-encoder-Q5_K_M.gguf using gguf format
[DEBUG] model.cpp:818  - init from 'J:\LLM_MODELS\SD\clip\t5-v1_1-xxl-encoder-Q5_K_M.gguf'
[INFO ] stable-diffusion.cpp:244  - Version: SD3.5 8B
[INFO ] stable-diffusion.cpp:275  - Weight type:                 q5_0
[INFO ] stable-diffusion.cpp:276  - Conditioner weight type:     f16

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 24, 2024

@razvanab I can confirm, it doesn't work. You'll have to quantize it yourself with sdcpp, or wait for someone else to do it and upload the models to Huggingface.

@razvanab
Copy link

I see.
Thanks

@razvanab
Copy link

razvanab commented Oct 24, 2024

Ok, now I get this error. I should probably wait for someone who knows what is doing to quantize it. 

Error.txt

I have quantize t5xxl too and that get rid of some error too. But i still get a lot of error for:

clip_g.safetensors

Nevermind, I was stupid for not getting the correct clip_g.safetensors file.

Sorry about this.

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 25, 2024

@razvanab I'm uploading some here if you want: https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp

@razvanab
Copy link

I did the same last night, but I forgot to post it here.
If you want, you can take the t5xxl model from there and post it on your repo.

https://huggingface.co/razvanab/SDCpp

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 25, 2024

@razvanab Btw you can find more compatible t5xxl quants and clip-l quants here: https://huggingface.co/Green-Sky/flux.1-schnell-GGUF/tree/main.

@razvanab
Copy link

Oh, nice t5xxl q8_0 under 6GB 
for some reason for me q8_0 ended up being over 6 GB.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants