-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10 bit per channel proposal #357
Comments
A couple thoughts:
|
Something I personally would like to add is both color models beyond RGB and non-color channels. I don't know if others want to add support for this. We haven't discussed it yet. |
Hi @ProgramMax Sure, normals, motion vectors, 3 channel 10bit alphas, any of the independent lighting passes, zdepth... So, in #3 of my second post, I indicate how the For an 8 bit container, that's 128, but for 10 bit it's 512. So for that data type, we might prefer to take one top and one bottom bit, then for the legacy 8 bit decoders truncating, the center would be at 128, so would look like any other normals map when viewed as the truncated 8bit. Though if a CGI program was to read it in as if an 8bit normal there may be unexpected results. But as far as an image viewer, should be no different than using those channels for normals and such now, or motiono vectors, or whatnot. As I mentioned, I'm thinking the Type ID ThoRereading your post, perhaps you're suggesting more about having an ID chunk? For that, tEXT should work, or exif? though I suppose a dedicated type-ID chunk might lead the way to a more common interchange file standard for things like normals (which can be different for different apps/engines/etc.) The next question is, does any app/engine use 10 bit normals, or the others? I can't remember ever doing so... Even if the image data is 10 bit, it's not uncommon to pair with an 8 bit alpha tiff for instance. 10bit is popular for transport & streaming. It was, once upon a time, popular for intermediates in film work, though EXR (16 bit half float) is a better route for post and VFX work, and finishing for that matter. EXF uses float for image and alpha data, is inherently linear (gamma 1.0) and can handle n-channels of data, including int data (you can mix 16bit fload, 32 bit float, and 32bit unsigned int, etc etc...) On the otherhand, 10 bit per chan is a significant improvement in sample size for a pixel, at minimal cost if you don't need float or a full alpha (clearly cheaper than 16bit in a few ways). |
Right. My thought is if someone opens a PNG which holds normal for a 3D model, the image data itself isn't terribly useful. They aren't seeing a real image in the usual sense. If we completely swapped the "red" and "blue" channels, would they care? Probably not. So we could completely change how that appears without it feeling broken suddenly. My thought is to add a new chunk (not having some magic text inside tEXT) that specified what each channel of data represents. This chunk could be marked as required-to-view. And conforming image viewers will no longer show the data as if it was plain old RGB. I feel like this is the most correct approach. But this would mean an image which was previously viewable is no longer viewable. And maybe that counts as breaking the viewer? My vote is no, since the data wasn't meant to be viewed as RGB anyway. If we're able to specify what each channel's meaning is, it would make sense to also specify each channel's data format (EG 10-bit int). The details of how the data formats are packed are more along your original comment. What you propose makes a lot of sense to me for 10-bit int. |
Hi @ProgramMax
I find it useful....as someone who does 3D cgi, animation, and compositing... and I'd say the same for motion vector files and in fact most data types that use image containers—and I'd recon most of the other VFX artists I know would echo that... Just looking at it I can tell if its a normals map, a motion vector, a bump, etc... and when dealing with a folder full of files, this is helpful. I do agree if you are implying possible future viewers might be able to parse normals, motion vectors, occlusions, etc etc, and present them in a viewer that permitted applying that data and/or compositing it, at least as a proxy, onto some other image—that could be useful. But I don't think it's necessary to mute the output per se, it isn't noise/raw data that displays as garbage... And even that is desirable at times, depending. But most of the stuff we stick into image containers, even if ostensibly "data", we stick in image container because it is essentially a form of or derivative of image data, and seeing it we can see exactly what it's all about. That said, I do like the idea of labeling each channel, especially for such data formats. And a specific chunk like that would be useful for the 16 bit RGBA also... |
I think we agree. I should clarify. It is useful for those purposes (normals, motion, etc). And it is useful for people in the know to see a weird-looking image and say "That is definitely normal data". The guideline I've been using (could be wrong) for whether or not a chunk should be ancillary is would a picture of an apple still convey as a picture of an apple without this chunk being understood? For example, perhaps a colorspace chunk is not understood by a given image viewer. But it's still a red apple. The image isn't conveyed 100% correctly. The red will be slightly off. But it is correct enough to be useful to the average user. But for normals, motion, etc the whole apple & average user analogy breaks down. To them, the image data isn't useful. To them, there was never anything to see. There was no apple. So if a new chunk breaks existing viewers--but only for images that the average person found useless--does it matter? You have a great point though that people in the know still get value out of seeing that image. It would put a burden on them to use an image viewer that either understands the new chunk or ignore PNG ancillary rules. That might not be a large ask for a person in the know. But it is still worth considering. |
Oh I see what you're saying... But... should a new chunk break anything? Shouldn't any viewer just ignore chunks it doesn't understand? I would hope they would, if only for stability... |
This had me thinking. Here are the thoughts: a mini float, and a way to put 12bit PQ into an 8bpc container. Edit new:Updated chart, changed direction here a bit, working on a dynamic asymmetrical bias for the exponents, need to work out subnorm numbers for instance. But by pencil tests, seems like we're getting 11-12 bit performance... maybe... Far from finished. Meanwhile here's the updated map: Working on a revision, click for legacy contentBelow is under revision due to some ideas that developed, will show soon. Mini Float pngProposed: A mini float format wedged into an 8 bit per channel RGBA png. Each 8 In the 8 repourposed alpha bits, the MSb is a 1 bit alpha, next is a Asymetrical bias with a per-pixel control of signed/unsigned mode, 4 bit exponent in unsigned mode, and 3 bit exponent in signed mode. The The top MSb of the exponent is either a 4th exponent bit for positive
Bit segmenting is the two MSb of the exponent, the full mantissa is in Significand (Mantissa) is 7 bits (6 bits explicit, 1 implied). Bias and Base can be arbitrarySet in the
An arbitrary base can be used
Considerations
Summary
|
YPQUV pngProposed: This converts 12bit per channel PQ RGB to YUV, where Y is PQ gamma at 12 The advantage is maintaining the essential luminance resolution, but Coefficients are applied to the gamma encoded RGB tuples for In the layout below, the V is inverted to -V, essentially putting blue Advantages
YPQUV 422 pngProposed: This converts 12bit per channel PQ RGB to YUV, where Y is PQ gamma at 12 The advantage is maintaining the essential luminance resolution, but Coefficients are applied to the gamma encoded RGB tuples for Unlike the above version (12 10 10), this version will not be It should compress well, as the vertically adjecent pixels will both be either U or V type, so though the horizontal adjacent pixels alternate U or V, the prefilter should in theory select the vertically adjacent pixel for the deltas. UVUVUV An alternate scheme is, horizonally UUVVUUVV, and offsetting each line by 1 pixel as: The stagger should progress right: UUVVUUVV This way a U (or V) will always have a U (or V) either above or to the left, and
Among other things, if the prefilter selects the linear (A+B+C)/3 mode, two of the three pixels will always be the same type as the present pixel. Find delta for A V pixel, and the adjacent A,B,C pixels. Advantages
Unknown Issues
Sampling U and V at half the spatial is a common strategy, as the Y holds all the important spatial detail. THis is in accordance with the human vision system's handling of hue/chroma at a third or less the resolution of luminance. Rather than UV, other color difference modes could of course be used... But thinking efficiency, simplest transforms, avoid any unnecesarry math for converting from YUV to RGB.. The question, is LZ77 decompress fast enough for a streaming use case... |
To test compression efficiency you could do the bit shuffling yourself, encode as 8-bit truecolor-alpha and compare file size to the next closest thing (16-bit truecolor if we ignore the 2-bit alpha). Just make sure the test images are actually 10-bit, if they're upscaled from 8-bit and the LSB's stuffed into the alpha channel are all the same it's gonna compress better than it should. |
Yes I have a lot of 10 bit material, currently as DPX files. should be straight forward to modify a png library for this testing, to see if there is value in proceeding... looking for a suitable JS png lib... |
10 bit isn't particularly general. One of the core problems of PNG is that the channel encoders are not separated, so patterns in the channels get obfuscated in the higher (or is it lower?) level LZ77 compression. The general solution is a bit-agnostic encoder in each channel and a good model of the data being encoded. This, of course, sounds like LZW :-) But not really; the problem is the data model (RGBA, linear, equal precision) not the compression technique. The productive approach is not specific ad hoc encodings but consideration of the underlying data model. I suggest it is CIELuv. YCbCr is an example of an encoding more appropriate to CIELuv than RGB. PNG (or JPEG, or TIFF) isn't beholden to the specific encoding it uses. An optimal encoding does not require any relationship to the nominal encoding; for example RGB might be optimally encoded as CIELuv then decoded back into RGB without loss. The problem is that PNG is beholden to the mindset of computer programmers and to their approach. The basic science is clear; humans have limited discrimination of colour but a remarkable ability to perceive luminance variations over a massive range (11D to 16D?) The colour part of that range is about 8D and within that range how many colors are there? (Anyone who says "all the colors of the rainbow" please step outside and stick your head in a bucket of water; trichromats only see seven colors in the rainbow and I can only see six of them). The key is getting the right model first then developing the encoding. PNG makes the approach simple: the IHDR contains a "color type" field and a "compression" field. Either or both have private values; anything >=128. Just do it; this is a well established approach. Use the "private" definitions to implement and use a better encoding/model, standardization will follow. Don't standardize from the altar! |
Hi John @jbowler
I'm not sure what you mean. 10 bit per channel is the second most used bit depth after 8 bit for images and especially for streaming image sequences/video. 10bit per channel is very common, and required for higher gamuts and higher dynamic range methods. I'm not certain about the remainder of the post: .png is a defined standard, with a defined method for data compression. The 10-bit png I proposed is designed specifically to work within the existing framework with minimal issues, and maximum data compression given the existing paradigm, without created an entirely different "not .png" model. The other potential encodings I should probably delete from this thread as they may be causing confusion. |
10bit-png
Proposed:
TENB
(TeNB
interim)A proposal for a compression efficient 10bit variant of png
While it is possible to use the sBIT chunk to put a 10bit per channel image into a 16bit container.....
This problem spawned an idea.....
Not your granddad's DPX
When I stumbled onto this thread, first thought was the existing three channels of 10bit into four bytes format of DPX, but I don't think that fits well into png.
The Alpha Wolf Shares with the Pack
But if there is a desire to conserve bandwidth, it occurred to me that the color type 6 8bit rgba png, might easilly be modified so the two LSbits of each 10bit channel are mapped onto the 6 LSbs of the alpha channel, and the 2 MSbs of the alpha could still be used for a one or two bit alpha.
A two bit alpha could be combined with the
tRNS
chunk to have 4 indexed transparency values (though that may cause compatibility issues), otherwise,0b00
= 0% opaque,0b01
= 33%,0b10
= 66%,0b11
= 100% opaque.Fall Backwards (compatibility)?
The next question is, is there a configuration where a decoder/viewer that was not capable of handling this
segmented10bit
format, and just discarded the bits in the alpha as if fully opaque? In this case, the LSbs would be truncated, and while truncation is a poor way to handle down sampling, and does have artifacts, the image would still be reasonably viewable.The
sBIT
chunk provides the way to make this happen, to mask the LSbs in the alpha and also show the 2 MSbs of transparency, so a 10 bit image could display with a reasonable fallback in a naïve/legacy viewer.Setting
sBIT
to8 8 8 2
, then current decoders/viewers should display the truncated-to-8 image okay, and with the alphasBIT
at2
bits, only the two MSbs would be used, with the 6 LSb rgb bits being hidden.Virtual Signaling
As per the graphic below, the IHDR chunk would indicate 8 bit and color type 6, for fallback compatibility. So how to tell the decoder we're a segmented 10bit image? We use a
tEXt
chunk with one string that sayssegmented10bit
to signal the format, and this maintains backwards compatibility.Advantages
A 10bit png format with all the advantages of png, but a bit depth to match many current video and HDR formats.
Should compress similarly to an 8bit RGBA png. First three bytes are not unlike an 8 bit image, then the one LSb/alpha byte, which might not compress as small as a typical alpha, but overall this scheme should be substantially more efficient than 10-in-16.
So long as the decoder supports
sBIT
, this should be backwards compatible, with the caveat that images with truncated LSbs may have artifacts.And finally, though tests need to be run, this seems like an efficient way to handle 10bit images as far as the compression and total data size is concerned.
I started a repo to commence work on this if there is interest.
Thank you for reading.
Andrew Somers
Director of Research
Inclusive Reading Technologies, Inc.
The text was updated successfully, but these errors were encountered: