-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix and improve: VAE tiling #372
Conversation
- properly handle the upper left corner interpolating both x and y - refactor out lerp - use smootherstep to preserve more detail and spend less area blending
6bfc471
to
22e48d8
Compare
I played around a bit with the interpolation code, and ended up with this. It seems to be almost flawless, except for the right and bottom borders that get washed out: // Would be great to have acess to the max values for x and y
const float x_f_0 = x>0 ? ix / float(overlap) : 1;
const float x_f_1 = (width - ix) / float(overlap);
const float y_f_0 = y>0 ? iy / float(overlap) : 1;
const float y_f_1 = (height - iy) / float(overlap);
const float x_f = std::min(x_f_0,x_f_1);
const float y_f = std::min(y_f_0,y_f_1);
ggml_tensor_set_f32(
output,
old_value + new_value * ggml_smootherstep_f32(y_f<1 ? y_f : 1) * ggml_smootherstep_f32(x_f<1 ? x_f : 1),
x + ix, y + iy, k
); |
Ok I got it! I could get the image width and height from the output tensor // unclamped -> expects x in the range [0-1]
__STATIC_INLINE__ float ggml_smootherstep_f32(const float x) {
GGML_ASSERT(x >= 0.f && x <= 1.f);
return x * x * x * (x * (6.0f * x - 15.0f) + 10.0f);
}
__STATIC_INLINE__ void ggml_merge_tensor_2d(struct ggml_tensor* input,
struct ggml_tensor* output,
int x,
int y,
int overlap) {
int64_t width = input->ne[0];
int64_t height = input->ne[1];
int64_t img_width = output->ne[0];
int64_t img_height = output->ne[1];
int64_t channels = input->ne[2];
GGML_ASSERT(input->type == GGML_TYPE_F32 && output->type == GGML_TYPE_F32);
for (int iy = 0; iy < height; iy++) {
for (int ix = 0; ix < width; ix++) {
for (int k = 0; k < channels; k++) {
float new_value = ggml_tensor_get_f32(input, ix, iy, k);
if (overlap > 0) { // blend colors in overlapped area
float old_value = ggml_tensor_get_f32(output, x + ix, y + iy, k);
const float x_f_0 = x>0 ? ix / float(overlap) : 1;
const float x_f_1 = x<(img_width - width)? (width - ix) / float(overlap) : 1 ;
const float y_f_0 = y>0 ? iy / float(overlap) : 1;
const float y_f_1 = y<(img_height - height)? (height - iy) / float(overlap) : 1;
const float x_f = std::min(x_f_0,x_f_1);
const float y_f = std::min(y_f_0,y_f_1);
ggml_tensor_set_f32(
output,
old_value + new_value * ggml_smootherstep_f32(y_f<1?y_f:1)*ggml_smootherstep_f32(x_f<1?x_f:1),
x + ix, y + iy, k
);
}else{
ggml_tensor_set_f32(output, new_value, x + ix, y + iy, k);
}
}
}
}
} |
Why does this work? old_value + new_value * ggml_smootherstep_f32(y_f) * ggml_smootherstep_f32(x_f), This is not a lerp, it is adding the new value x (btw totally forgot you can multiply values in the |
Assuming the following tiles: If you ignore the
If you're on an edge instead of a corner, either x or y is clamped to 0 or 1. |
Yea I get that (thanks for the effort), but what I meant was, we are summing the color here. |
It's equivalent to a bilinear interpolation. Just done in multiple steps because we don't have access to all the values at the same time |
Co-authored-by: stduhpf <[email protected]>
Pushed, should be ready to merge. 🎉 |
Thank you for your contribution. |
Thanks for the merge. I'll do some benchmarking to see if it's worth making a pr for optimizing this loop. |
I can gain a bit over 40ms (out of 18s, so 0.22% difference) for 56 tiles by optimizing this out. It's measurable, but not really significant. |
Yea, I thought about optimizing it too, but then i remember that simplicity is key, and this code is not using enough compute to bother. :) |
* fix and improve: VAE tiling - properly handle the upper left corner interpolating both x and y - refactor out lerp - use smootherstep to preserve more detail and spend less area blending * actually fix vae tile merging Co-authored-by: stduhpf <[email protected]> * remove the now unused lerp function --------- Co-authored-by: stduhpf <[email protected]>
Completely fixes tiling seams!
Thanks to @stduhpf
fixes #353.
Old Details
Here is a test image, which replaces each tile with a single uniform shade of gray. The lines are the midway points of the overlap.
Before:
After:
And now with a proper test image.
Before:
After:
Without tiling:
IMO a significant improvement. There are however still vertical seams. I am a bit out of ideas now, so I wanted to contribute what I already have.
Some more details can be found in #353 .
This is ready to merge.
Before:
After: