fix and improve: VAE tiling #372

Green-Sky · 2024-08-26T12:51:46Z

properly handle the upper left corner interpolating both x and y
use smootherstep to preserve more detail and spend less area blending

Completely fixes tiling seams!
Thanks to @stduhpf

fixes #353.

Old Details

Here is a test image, which replaces each tile with a single uniform shade of gray. The lines are the midway points of the overlap.

Before:

After:

And now with a proper test image.

Before:

After:

Without tiling:

IMO a significant improvement. There are however still vertical seams. I am a bit out of ideas now, so I wanted to contribute what I already have.

Some more details can be found in #353 .

This is ready to merge.

Before:

After:

- properly handle the upper left corner interpolating both x and y - refactor out lerp - use smootherstep to preserve more detail and spend less area blending

stduhpf · 2024-08-26T15:57:04Z

I played around a bit with the interpolation code, and ended up with this. It seems to be almost flawless, except for the right and bottom borders that get washed out:

// Would be great to have acess to the max values for x and y
  const float x_f_0 = x>0 ? ix / float(overlap) : 1;
  const float x_f_1 = (width - ix) / float(overlap);
  const float y_f_0 = y>0 ? iy / float(overlap) : 1;
  const float y_f_1 = (height - iy) / float(overlap);

  const float x_f = std::min(x_f_0,x_f_1);
  const float y_f = std::min(y_f_0,y_f_1);

  ggml_tensor_set_f32(
      output,
      old_value +  new_value * ggml_smootherstep_f32(y_f<1 ? y_f : 1) * ggml_smootherstep_f32(x_f<1 ? x_f : 1),
      x + ix, y + iy, k
  );

stduhpf · 2024-08-26T16:06:51Z

Ok I got it! I could get the image width and height from the output tensor

// unclamped -> expects x in the range [0-1]
__STATIC_INLINE__ float ggml_smootherstep_f32(const float x) {
    GGML_ASSERT(x >= 0.f && x <= 1.f);
    return x * x * x * (x * (6.0f * x - 15.0f) + 10.0f);
}

__STATIC_INLINE__ void ggml_merge_tensor_2d(struct ggml_tensor* input,
                                            struct ggml_tensor* output,
                                            int x,
                                            int y,
                                            int overlap) {
    int64_t width    = input->ne[0];
    int64_t height   = input->ne[1];

    int64_t img_width    = output->ne[0];
    int64_t img_height   = output->ne[1];

    int64_t channels = input->ne[2];
    GGML_ASSERT(input->type == GGML_TYPE_F32 && output->type == GGML_TYPE_F32);
    for (int iy = 0; iy < height; iy++) {
        for (int ix = 0; ix < width; ix++) {
            for (int k = 0; k < channels; k++) {
                float new_value = ggml_tensor_get_f32(input, ix, iy, k);
                if (overlap > 0) {  // blend colors in overlapped area
                    float old_value = ggml_tensor_get_f32(output, x + ix, y + iy, k);

                    const float x_f_0 = x>0 ? ix / float(overlap) : 1;
                    const float x_f_1 = x<(img_width - width)? (width - ix) / float(overlap) : 1 ;
                    const float y_f_0 = y>0 ? iy / float(overlap) : 1;
                    const float y_f_1 = y<(img_height - height)? (height - iy) / float(overlap) : 1;

                    const float x_f = std::min(x_f_0,x_f_1);
                    const float y_f = std::min(y_f_0,y_f_1);
                    ggml_tensor_set_f32(
                        output,
                        old_value +  new_value * ggml_smootherstep_f32(y_f<1?y_f:1)*ggml_smootherstep_f32(x_f<1?x_f:1),
                        x + ix, y + iy, k
                    );
                }else{
                    ggml_tensor_set_f32(output, new_value, x + ix, y + iy, k);
                }
            }
        }
    }
}

Green-Sky · 2024-08-26T17:31:22Z

Why does this work?

old_value + new_value * ggml_smootherstep_f32(y_f) * ggml_smootherstep_f32(x_f),

This is not a lerp, it is adding the new value x[0,1], right?

(btw totally forgot you can multiply values in the [0,1] just like that, 😄 )

stduhpf · 2024-08-26T17:45:25Z

Why does this work?
old_value + new_value * ggml_smootherstep_f32(y_f) * ggml_smootherstep_f32(x_f),
This is not a lerp, it is adding the new value x[0,1], right?

(btw totally forgot you can multiply values in the [0,1] just like that, 😄 )

Assuming the following tiles:
A B
C D

If you ignore the smoothersteps, the values of the output tensor in the middle overlap take these succesive values:
((x,y) are the coordinates within the quadruple overlap)

(0) + A * (1-x)*(1-y)
(A * (1-x)*(1-y)) + B * x * (1.-y) = lerp(A, B, x)*(1.-y)
(lerp(A, B, x)*(1.-y)) + C * (1.-x)*y
(lerp(A, B, x)*(1.-y) + C * (1.-x)*y) + D * x * y = lerp(A, B, x)*(1.-y) + lerp(C, D, x)*y = lerp(lerp(A, B, x), lerp(C, D, x), y)

If you're on an edge instead of a corner, either x or y is clamped to 0 or 1.

Green-Sky · 2024-08-26T17:52:39Z

Yea I get that (thanks for the effort), but what I meant was, we are summing the color here.
color0 + color1 * f;
So we are not interpolating between them, something most be funny somewhere still.
I am about to push your code, adding you as a coauthor.

stduhpf · 2024-08-26T17:54:58Z

It's equivalent to a bilinear interpolation. Just done in multiple steps because we don't have access to all the values at the same time

Co-authored-by: stduhpf <[email protected]>

Green-Sky · 2024-08-26T18:07:16Z

Pushed, should be ready to merge. 🎉

ggml_extend.hpp

leejet · 2024-08-27T16:20:37Z

Thank you for your contribution.

stduhpf · 2024-08-27T16:33:29Z

Thanks for the merge.
I realize now that my code isn't quite optimal (variables are assigned inside the inner loop when they could very well be assigned outside), but the impact doesn't seem noticable (maybe the compiler optimizes it?)

I'll do some benchmarking to see if it's worth making a pr for optimizing this loop.

stduhpf · 2024-08-27T16:50:28Z

I can gain a bit over 40ms (out of 18s, so 0.22% difference) for 56 tiles by optimizing this out. It's measurable, but not really significant.

Green-Sky · 2024-08-27T20:30:11Z

Yea, I thought about optimizing it too, but then i remember that simplicity is key, and this code is not using enough compute to bother. :)

* fix and improve: VAE tiling - properly handle the upper left corner interpolating both x and y - refactor out lerp - use smootherstep to preserve more detail and spend less area blending * actually fix vae tile merging Co-authored-by: stduhpf <[email protected]> * remove the now unused lerp function --------- Co-authored-by: stduhpf <[email protected]>

Green-Sky mentioned this pull request Aug 26, 2024

WIP - Web server + conv2d fused + k-quants + dynamic gpu offloading #221

Draft

fix and improve: VAE tiling

22e48d8

- properly handle the upper left corner interpolating both x and y - refactor out lerp - use smootherstep to preserve more detail and spend less area blending

Green-Sky force-pushed the vae_tiling_fixes branch from 6bfc471 to 22e48d8 Compare August 26, 2024 12:56

actually fix vae tile merging

03afc83

Co-authored-by: stduhpf <[email protected]>

stduhpf approved these changes Aug 26, 2024

View reviewed changes

ggml_extend.hpp Outdated Show resolved Hide resolved

remove the now unused lerp function

b593aac

leejet merged commit e71ddce into leejet:master Aug 27, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix and improve: VAE tiling #372

fix and improve: VAE tiling #372

Green-Sky commented Aug 26, 2024 •

edited

Loading

stduhpf commented Aug 26, 2024 •

edited

Loading

stduhpf commented Aug 26, 2024 •

edited

Loading

Green-Sky commented Aug 26, 2024 •

edited

Loading

stduhpf commented Aug 26, 2024 •

edited

Loading

Green-Sky commented Aug 26, 2024

stduhpf commented Aug 26, 2024 •

edited

Loading

Green-Sky commented Aug 26, 2024

leejet commented Aug 27, 2024

stduhpf commented Aug 27, 2024

stduhpf commented Aug 27, 2024 •

edited

Loading

Green-Sky commented Aug 27, 2024

fix and improve: VAE tiling #372

fix and improve: VAE tiling #372

Conversation

Green-Sky commented Aug 26, 2024 • edited Loading

Before:

After:

Before:

After:

Without tiling:

Before:

After:

stduhpf commented Aug 26, 2024 • edited Loading

stduhpf commented Aug 26, 2024 • edited Loading

Green-Sky commented Aug 26, 2024 • edited Loading

stduhpf commented Aug 26, 2024 • edited Loading

Green-Sky commented Aug 26, 2024

stduhpf commented Aug 26, 2024 • edited Loading

Green-Sky commented Aug 26, 2024

leejet commented Aug 27, 2024

stduhpf commented Aug 27, 2024

stduhpf commented Aug 27, 2024 • edited Loading

Green-Sky commented Aug 27, 2024

Green-Sky commented Aug 26, 2024 •

edited

Loading

stduhpf commented Aug 26, 2024 •

edited

Loading

stduhpf commented Aug 26, 2024 •

edited

Loading

Green-Sky commented Aug 26, 2024 •

edited

Loading

stduhpf commented Aug 26, 2024 •

edited

Loading

stduhpf commented Aug 26, 2024 •

edited

Loading

stduhpf commented Aug 27, 2024 •

edited

Loading