Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix YUV->RGB shader (YCbCr->RGB really).
WebRTC SDK got it wrong. See https://web.archive.org/web/20180421030430/ http://www.equasys.de/colorconversion.html
- Loading branch information
06f7b1d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-preston, this should work mostly fine for BT.601-encoded videos, but won't work correctly for videos encoded with BT.709 or BT.2020. I say 'mostly fine' only because you missed the
0
in2.017
on line 85 (now line 88 in the current version of the file (as of 2022-04-02)).Personally, I would recommend performing a matrix multiply and using vectors instead of individual float values. GPUs can better optimize and better parallelize operations when GLSL code is organized to leverage native data types and sizes, with relevant data grouped together.
Unfortunately, it also looks like the GLSL has been heavily split apart into multiple pieces now, which makes it harder to find and fix this type of thing, especially with single-letter variables that are now separated from when they're defined (making it unclear what exactly they represent).
However, it's also a good thing it's split up so much - that gives precedence to define multiple alternate YUV→RGB conversion matrices, which can be used depending on the metadata or resolution of the data. Video codecs like H.264 can indicate which variation of YUV was used to encode the video, which affects which variation to use when decoding it.
Here's an example of how I'd personally organize this (note: the fractions are simply more precise versions of the matrix coefficients you used):
FragmentSampleYUV420Texture()
, though this is just an example of the desired format; this would be applied to the other variations as well):FragmentYUV2RGB()
):Generally, if a video file isn't tagged, resolutions lower than some resolution (chosen arbitrarily to be somewhere between 480p and 720p) are treated as BT.601, and anything higher than the arbitrary cutoff point is treated as BT.709. BT.2020 is significantly different and requires further processing, especially if it's the 'constant luminance' variation.
Other colorspace metadata that mp4 and mov files might include (and I think webm can as well, but I'm not as sure about that) is for the RGB colorspace, those primarily being the gamma correction standard used and the color primaries. That's all beyond the scope of what I'm talking about here, though it's a topic I've studied in full and already have extensive experience implementing in GLSL.
06f7b1d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize if the above comment caused 3 notifications to be sent off. I accidentally hit Ctrl+Enter twice while still writing it, finger slipping off of 'Shift' (which copies the whitespace from the previous line to the new line being created, useful while writing those code snippets). I then promptly copied the contents of the incomplete posts, deleted the posts, and kept working.
06f7b1d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, if you want to avoid the weirdness of wrapping the operation at the end in
vec4
with the trailing1
, you could instead turn thosemat3
s intomat4
s like this:And then the last couple fragments become:
06f7b1d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While tempted to bring up the weirdness with how you have 3 separate textures for the image data, you guys probably have your reasons for it (I'm guessing something to do with receiving separate planes from the decoder? I've never messed with actually displaying decoded video with GLSL, I'm just a colorspace nurd).
There might be other things I could comment on, but the only other thing that comes to mind is that perhaps instead of hard-coding each possible matrix into the code, instead have sets of YUV coefficients, and generate each matrix from those sets. Makes it easier to support multiple variations of YUV, and only 2 numbers are needed per variation (
Kr
andKb
; technicallyKg
is needed too, but it's always1 - Kr - Kb
. Weellll, teechnically you could calculate any of them given the other 2, but Wikipedia seems to only showKr
andKb
values for some of them, so it's just easier to consider those to be the givens).You might really want to consider doing this, so that you can also support the case where the metadata in a file states to use the 'full' (or 'pc') range instead of the 'limited' (or 'tv') range. When such data is provided, the vector being subtracted at the end of the 'decoding' fragment would be changed to
vec3(0, 128, 128)/255
, the top row of values in the matrices all turn to1.0
, and the other values in the matrix are scaled differently. BT.601's matrix, for example, becomes:You might recognize a couple of those numbers as being similar to - but not exactly - the wrong values used previously. I don't know why they have
1.403
instead of1.402
, nor do I know why they'd truncate1.772
to1.77
(they're weirdly inconsistent with the number of decimal places they use), but basically the values they used were for 'pc'-scaled YUV values, not 'tv'-scaled values. If I don't use fractions, it becomes (approximately):