-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New DDS decoder #2258
base: main
Are you sure you want to change the base?
New DDS decoder #2258
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing short of amazing 🎉
This isn't fully comprehensive, but I do have the time for a bit of a review.
So this is probably a bit larger than we'd like within the main crate. Also, for bc decoding there's already a crate which looks interesting and doesn't even have dependencies: bcdec_rs
. As far as is possible, I'd see if the bc.rs
and convert.rs
contents and parts of decoder.rs
might be upstreamed to that crate and then used via a dependency. It's important to reduce the amount of code to be maintained centrally. (Aside: they, as in the author, also seem to have a file/encoding/decoding crate about it, image_dds
, so might be interested anyways).
Doing this with f32 is quite slow, so I did with only integer operations that are around 3x faster than f32 and around 2x harder to understand.
I'm not sure all the tricks are worth the infamiliarty burden. Did you benchmark them? End-To-End numbers would be most convincing here, some floating point operations are quite fast, unless integer versions are smaller and can be vectorized better.
How should I test this?
8 MB isn't that much if the files don't change. That is, are they free of copyright risks etc. This would be my primary concern. Still I'm wondering why is it that the test images are this large?
Should this in a separate crate?
Well see above, quite possibly parts. See image-webp
. I think image_dds
is owned also by the author of the bc decoding crate about, but as you'll note with weezl
and zune-jpeg
it's totally fine to choose a more unique name anyways.
What should we do with the old DXT implementation? Maybe it could be removed after the new DDS decoder is in?
Yes, thinking so as well.
How careful do I have to be with resource limits?
Not too careful, although adhering to resource limits is always nice. If you can setup fuzzing, see the infrastructure in png
and webp
, these might also yield some very early detection of whether the memory limits are upheld. If it's within some constant factor above the configured one, I wouldn't worry. In png
we've set the software limit of the decoder to half the fuzzer's own limit iirc.
Thanks for the quick response!
However, I don't think that the things in On the topic of I also want to mention that I don't intend to maintain this code, so if you want to make it a separate crate, I would ask to put it under the
Copyright won't be a concern for most images. I tested most formats with a 257x131 test image I created myself, and then used texconv to create DDS files in different formats. This is the image btw: However, there are some images for which copyright will be an issue. For some less-common formats, I only have one image per format that I found in some games (e.g. Elden Ring and the Dark Souls games). Given that all of these formats are uncompressed (no fancy block compression), I could write some scripts to generate images in those formats. Non-standard flags could also be tested by just hex-editing the headers of similar DDS files. As for the size: most DDS formats are uncompressed, so even my little test image produces a 174KB DDS file when saved with 8bpc RGBA color. Take that times 100 files and that's why. My main way of reducing the size would be use an even smaller test image btw.
I did a quick benchmark for a DX10 B5G6R5_UNORM image. (This format basically just does a range conversion (the same one BC1 uses) and that's it, so it's the ideal format to test this with.) I also found that the
For the f32 conversion I used this snippet (similar for #[inline(always)]
pub(crate) fn x5_to_x8(x: u16) -> u8 {
debug_assert!(x < 32);
const FACTOR: f32 = 255.0 / 31.0;
(x as f32 * FACTOR + 0.5) as u8
} So I already optimized to multiply with a single constant and use the truncation So I think I'm going to switch to the trick And I think I also find constants for the same trick to use in other range conversion function. I don't know how they derived their constants, but they are small enough that I can use brute force to find more. |
Alright, so it's pretty much ready now. Updates:
There are just a few things we have to talk about:
|
I've also run into a problem with testing: How do I test RGB32F images? |
Sorry for the delay! I mostly finished up the PR now.
However, I need help with testing the floating-point formats. I don't know how to generate reference images for them, since Otherwise, this PR is ready as I see it. The new DDS decoder is correct, decently performant, and covers most DDS formats. @HeroicKatora Please let me know what else needs to be done/changed. |
I made a new DDS decoder, because the old DXT-based decoder was very limited (only DXT1-5 + dimensions divisible by 4) and incorrect (DXT1 colors were not rounded correctly, resulting in discolorations). While this PR is not finished yet, I already implemented the following features:
The only main formats that are still missing are BC6 and BC7. These 2 are quite complex, so it will likely take me some more time to implement them. Once those 2 are implemented, this should be a pretty competent DDS decoder.
Some notes and technical decisions:
&mut dyn Read
in all format decoders to reduce binary size. Since there are a lot of DDS formats, there are a lot of functions to decode these formats. I think the binary size could be reduced even further, so I'm very open to feedback in that regard.(x as f32 / 31.0 * 255.0).round() as u8
(= first convert to a f32 value in the range 0-1, and then convert to u8 with rounding). Doing this with f32 is quite slow, so I did with only integer operations that are around 3x faster than f32 and around 2x harder to understand. The tricks I use are explained inx5_to_x8
inconvert.rs
.With what I did so far out of the way, I have some questions to the maintainers on how to integrate this into the
image
crate:tests/reference_image.rs
>render_images
to ensure that the decoder would correctly, but 20MB is a lot of image data to commit to a repo. I can go down to around 8MB, but not much lower. The issue is that I only have one file for some rare formats and no way to generate smaller images of those formats.image
crateO(sqrt(N))
(N=number of pixels) additional memory for roughly square images. However, an attacker would supply an image with height=1, causing the temporary buffer to as large as (but no larger than) the output buffer. I already limited the width and height of DDS images to be at most 224, so the temporary buffer can be at most 256MB (with anyR32G32B32A32_*
format).Also, (not a question) this my first large-scale Rust PR, so please feel free to pick apart and suggest improvements to everything you see.