-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Flux: clip_l support (SD3/3.5 improvements included) #397
base: master
Are you sure you want to change the base?
Conversation
If anyone know how this is all supposed to work, feel free to improve the code. |
I'm pretty sure this is not the way it's supposed to work, so I'm drafting this PR until I figure out how to make it work properly. |
Ok I believe i got it now |
Hard-coding the prompt for clip to always be Patch for hardcoded Clip promptdiff --git a/conditioner.hpp b/conditioner.hpp
index 8a710d1..75efb08 100644
--- a/conditioner.hpp
+++ b/conditioner.hpp
@@ -1038,9 +1038,10 @@ struct FluxCLIPEmbedder : public Conditioner {
std::vector<float> t5_weights;
for (const auto& item : parsed_attention) {
const std::string& curr_text = item.first;
+ const std::string& curr_text_l = "Painting, in the style of starry night by Van Gogh";
float curr_weight = item.second;
- std::vector<int> curr_tokens = clip_l_tokenizer.encode(curr_text, on_new_token_cb);
+ std::vector<int> curr_tokens = clip_l_tokenizer.encode(curr_text_l, on_new_token_cb);
clip_l_tokens.insert(clip_l_tokens.end(), curr_tokens.begin(), curr_tokens.end());
clip_l_weights.insert(clip_l_weights.end(), curr_tokens.size(), curr_weight);
Edit: I tried this test with my previous attempt at fixing clip (e6314d3) and I got a similar result. It was actually working-ish even though it was definitely not implemented like in the official Flux inference code. |
@leejet does this look good enough to merge? |
conditioner.hpp
Outdated
@@ -1073,7 +1065,7 @@ struct FluxCLIPEmbedder : public Conditioner { | |||
return {{clip_l_tokens, clip_l_weights}, {t5_tokens, t5_weights}}; | |||
} | |||
|
|||
SDCondition get_learned_condition_common(ggml_context* work_ctx, | |||
SDCondition get_learned_condition_common(ggml_context* work_ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra spaces?
clip.hpp
Outdated
@@ -711,7 +711,11 @@ class CLIPTextModel : public GGMLBlock { | |||
if (return_pooled) { | |||
auto text_projection = params["text_projection"]; | |||
ggml_tensor* pooled = ggml_view_1d(ctx, x, hidden_size, x->nb[1] * max_token_idx); | |||
pooled = ggml_mul_mat(ctx, ggml_cont(ctx, ggml_transpose(ctx, text_projection)), pooled); | |||
if(text_projection != NULL){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to fit in the formatting more with the surrounding. ( spaces )
This breaks SD3.5 support somehow! |
This reverts commit de973c1.
Ok it works again now. |
Fixes #396
I made the clip backend skip the last text projection if the
text_projection
tensor doesn't exist This is mathematically equaivalent to replacing thetext_projection
with the identity matrix (akatorch.eye()
).Also replaced the matrix multiplication with a biasless linear layer when
text_projection
exists (somehow this changes the outcome).Flux.1 Schnell (q3_k):
SD3 2B (q8_0):
SD3.5 Large Turbo (q4_1)