Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stb_image: Fix 16bits PNM images endianness handling #1710

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Kanma
Copy link

@Kanma Kanma commented Oct 23, 2024

16bits binary PNM images are supposed to use big-endian, even though it wasn't specified in the first versions of the format.

The // convert the image data from big-endian to platform-native comment is copied from the stbi__create_png_image_raw() function. There doesn't seem to be any code there to check what "platform-native" is, it just assumes little-endian, so I did the same.

I can add a small check at both places to determine if the conversion is needed if you want.

16bits binary PNM images are supposed to use big-endian, even though
it wasn't specified in the first versions of the format.
@nothings
Copy link
Owner

What do you mean it assumes platform native is little-endian?

@Kanma
Copy link
Author

Kanma commented Oct 23, 2024

From line

stb/stb_image.h

Line 4827 in 2e2bef4

} else if (depth == 16) {
, the code basically looks like:

if (depth == 16) {
    // big-endian to little-endian conversion
}

instead of something like:

if ((depth == 16) && are_we_on_little_endian_platform()) {
    // big-endian to little-endian conversion
}

There might be an actual check somewhere that I missed though.

@nothings
Copy link
Owner

But the conversion code is not big-endian to little-endian, it's big-endian to platform native. Big-endian to little-endian would just swap the bytes, which is not what this code does.

@Kanma
Copy link
Author

Kanma commented Oct 23, 2024

I might not understand what "platform-native" means in that context then. And admittedly, I don't really know how the stbi__create_png_image_raw() function is used in the whole PNG decoding scheme-of-things.

I only found that comment after a search for "endian" to see if I was missing something.

@nothings
Copy link
Owner

Do you understand what the code you added is doing, or did you just copy-paste it without understanding it?

@nothings
Copy link
Owner

nothings commented Oct 23, 2024

Like, you summarized the existing code as

// big-endian to little-endian conversion

but I don't know why you think that's an accurate summary, when the accurate summary is

// big-endian to platform-native conversion

I'm trying to understand why you think this code converts to little-endian when there's 0 little-endianness code in there.

@Kanma
Copy link
Author

Kanma commented Oct 23, 2024

Big endian is "the most significant byte first", little endian is "the least significant byte first".

For a 16 bits unsigned integer, it means that swapping the two bytes convert from one to the other.

I'm not sure about your definition of "platform-native", but if it means "the way the CPU stores values in memory", then

  • on x86_64, x86 and ARM: "platform-native" == little-endian
  • on PowerPC (for instance): "platform-native" == big-endian

(granted, on modern computers big-endian CPUs probably aren't used anymore).

If "platform-native" means something else, my comment about stbi__create_png_image_raw() being (maybe) wrong is irrelevant and we can concentrate only on my code addition.

Which, given the uint16 values stored in the uint8 array out with the most significant byte first (because it is stored that way in the file, read as-is by stbi__getn()), swap the bytes to end with a little-endian buffer. Which is the correct way on x86_64, x86 and ARM cpus, but not on more exotic ones.

@nothings
Copy link
Owner

nothings commented Oct 23, 2024

Ok, so you don't understand the code you copy-pasted. There is no byte-swapping in the code. It does not convert big-endian to little-endian. It converts big-endian to whatever the natural byte order of the machine is. If the machine is little-endian, it effectively swaps the bytes; if it's big-endian, it has no effect.

@rygorous
Copy link
Collaborator

To explain, little endian means that bytes are in order of increasing significance and big endian that bytes are in decreasing order of significance. For uint16 specifically, little-endian means that the uint16 value implied by two bytes is bytes[0] | (bytes[1] << 8) and big-endian means it's (bytes[0] << 8) | bytes[1].

One way to convert between the two is to swap bytes around manually and then read the value using a native CPU load instruction (which interprets the bytes as either big- or little-endian depending on the CPU architecture and possibly operating state). But that's not what that code is doing. The code in question manually assembles a uint16 value by reading the two bytes and putting them together with the desired significance (in this case big-endian). There is no need to know whether the host platform is little or big endian when doing it this way. Big endian means that the uint16 value is given by (bytes[0] << 8) | bytes[1] and the code just implements that formula directly.

@Kanma
Copy link
Author

Kanma commented Oct 23, 2024

Sorry, but I totally understand the code I wrote. Especially since I write this kind of thing since the 90's. Please stop that.

When using pointer arithmetic and type casting in C, you don't need to use a "native CPU load instruction". None is used in your code, none is used in mine.

At the end of the day, loading a 16 bits PNM file with stbi_load_16() produce complete garbage with the original code, and the exact same uint16_t values than in any software able to display that image (ie. GIMP) with my addition.

I can send you an example image if you need.

@rygorous
Copy link
Collaborator

rygorous commented Oct 23, 2024

Loading a 16-bit PNM with the big-endian conversion path used for PNGs should be completely fine, this whole discussion is just about there not being any assumption in the conversion path about running on a little-endian platform. uint16_t value = (bytes[0] << 8) | bytes[1] will read 16-bit uint values correctly on both big and little endian platforms, there is no assumption of (platform) endianness built in (there is of course an assumption of the value being read stored in BE format). If you look at the bytes of value, on a LE target platform you will get (in this order) bytes[1], bytes[0] and on BE you will get bytes[0], bytes[1]. That is, this code, as written, does (if you look at the bytes in memory) boil down to a byte swap on LE targets but not BE ones (which is exactly what should happen when converting from BE byte order to native). There is no assumption of the target platform being any particular byte order, nor does there need to be.

@Kanma
Copy link
Author

Kanma commented Oct 24, 2024

Ok, so after a good night's sleep... you are right.

It's so obvious now that it is embarrassing.

Sorry about that.

@nothings
Copy link
Owner

No worries, I mostly just wanted to make sure I wasn't missing anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants