[FFmpeg-user] Using YCoCg, ICtCp, and RGB color spaces

Chema Gonzalez chema at berkeley.edu
Fri Aug 13 03:49:01 EEST 2021


I'm trying to understand how pix_fmt and colorspace work together when
encoding using h264 and h265 (x264 and x265 codecs, respectively).
This is a very long email, but I feel I need to provide context for
what I'm trying to understand. I added several questions in the email.

My original goal was to get some videos encoded with h265 and the RGB,
YCoCg, and ICtCp color spaces, to understand the effect of the
different color spaces in the actual encoding sizes. I was expecting
the following:

* ICtCp should reduce the encoded size of a file, but only a small
amount. The color space is supposed to be optimized to add all the
video energy to the I component, and leave the Ct and Cp ones with
very little energy
* YCoCg should increase the encoded size of a file, but only a small
amount. The color space is interesting because the conversion
operations are very simple (no FP ops, just shifts and adds/subs).
* RGB should increase the encoded size of the file significantly when
compared to YUV444. I am expecting up to 3x (if we assume the RGB->YUV
axis rotation puts all the energy on the Y, and therefore the chromas
are free to encode).

First I checked the standards: The 3x color spaces are part of the
h265 standard (described in Table E.5 of the 2018/02 standard). The
first 2 (RGB and YCoCg) are part of the h264 standard (described in
Table E.5 of the 2016/02 standard).

So I tried to encode some h265 files. I tried the "-colorspace
<colorspace>" option, alongside "yuv444p" as the pixel format (I want
to avoid the effect of chroma subsampling in the final file sizes).

First I get the yardstick. Input is a 15-second excerpt from Tears of
Steel (1280x800).

$ ffmpeg -y -i input -c:v libx265 -pix_fmt yuv420p -qp 30 out.yuv444p.265
Step 1.

The file is 1701366 MB.

Let's see the effect of chroma subsampling:

$ ffmpeg -y -i input -c:v libx265 -pix_fmt yuv444p -qp 30 out.yuv444p.265
Step 2.

File is 1706445, or 1.003x. While I understand most of the benefit of
chroma subsampling is due to the RGB->YUV conversion, the number feels
on the smaller side.

I checked the contents of the .265 files (using
[h265nal](https://github.com/chemag/h265nal)). I run the following

$ h265nal --noas-one-line --add-length --add-checksum out.yuv444p.265
> out.yuv444p.265.txt
Step 3.

And then diff the outputs.

I can see the following differences (there are more, but I thought
these were the interesting ones):

* general_profile_idc is 1 for yuv420p, and 4 for yuv444p (both VPS and SPS)
* chroma_format_idc is 1 and 3, respectively (4:2:0 and 4:4:4)
* all the slice payloads are different

This looks legit. Clearly the first file is 4:2:0, and the second
4:4:4, and the contents are different.

OK, let's try with the RGB, YCoCg, and ICtCp color spaces.

$ ffmpeg -y -i input -c:v libx265 -pix_fmt yuv444p -colorspace rgb -qp
30 out.yuv444p.rgb.265
$ ffmpeg -y -i input -c:v libx265 -pix_fmt yuv444p -colorspace ycocg
-qp 30 out.yuv444p.ycocg.265
$ ffmpeg -y -i input -c:v libx265 -pix_fmt yuv444p -colorspace ictcp
-qp 30 out.yuv444p.ictcp.265
Step 4.

File sizes are 1706466, 1706466, and 1706473 MB, respectively. The
same size in the RGB and YCoCg file is very suspicious. Checking their
NALU data, I can see that the file contents are exactly the same,

* the PREFIX_SEI NALUs are the same size, but contain different
contents. This is likely the ffmpeg parameters encoded in the SEI
("colormatrix=0" for RGB vs. "colormatrix=8" for YCoCg).
* the SPS NALUs are the same size, but contain different
`matrix_coeffs` values (again, 0 for RGB, and 8 for YCoCg) in the VUI

That's it. Everything else is the same, including all the NALU slice
payloads. This makes no sense, as one of the encoded files is RGB, and
the other is a slight variation of YCoCg. In fact, playing the RGB
file I can see the colors are heavily distorted, as when you cast RGB
to YUV.

Checking the YCoCg file, the size is 7 bytes bigger, which is
explained by the 7x PREFIX_SEI NALUs in my file (the encoded string is
"colormatrix=14"). Apart from the SEI NALUs, the SPS NALUs are also
different, as their `VUI.matrix_coeffs` value is 14. All the NALU
slice payloads are the same.

Compared with the original yuv444p file, I see similar differences:

* SPS VUIs are different. When comparing the rgb/ycocg/ictcp cases
with the simple yuv444p, I can see that there are more differences in
the SPS VUI fields, which can be explained because in order to set
`matrix_coeffs` you need to first set the
`colour_description_present_flag`, and also provide valid
`colour_primaries` and `transfer_characteristics` values.
* the PREFIX_SEI values are different

But all the slice payloads are exactly the same.

So, it seems that setting the `-colorspace` parameter only affects the
SPS VUI. The encoded content will always be the yuv444p input, but it
will be marked as a different color space in the SPS VUI.

*Q1: Is this intended?*

Finally I tried using "rgb24" as the pix_fmt, and "rgb" as the colorspace.

$ ffmpeg -y -i input -c:v libx265 -pix_fmt rgb24 -colorspace rgb -qp
30 out.rgb24.rgb.265
Step 5.

File is 2766445 MB. That's 1.6x the yuv444p case. This is slightly
lower than what I expected, but in the ballpark.

Compared with the previous intent to produce RGB (using yuv444p), I
can see the following differences:
* the SPS.VUI `video_full_range_flag` is 1 now (before it was 0).
Changing the pix_fmt causes the video to be encoded full range instead
of limited range. Other than that, the VPS/SPS/PPS content is exactly
the same
* the slice payloads are all different

The out.rgb24.rgb.265 has the colors right, which suggests this is the
right wayto get h265 video using RGB.

| File                  | Size    |
| --------------------- | ------- |
| out.yuv420p.265       | 1701366 |
| out.yuv444p.265       | 1706445 |
| out.yuv444p.rgb.265   | 1706466 |
| out.yuv444p.ycocg.265 | 1706466 |
| out.yuv444p.ictcp.265 | 1706473 |
| out.rgb24.rgb.265     | 2766445 |

So it seems that the pix_fmt parameter does affect what is actually
sent to the encoder to encode, while the colorspace parameter only
affects how the SPS.VUI is actually set. This suggests there's no
actual support to encode anything but vanilla YUV: Running `ffmpeg
-pix_fmts` I see lots of combinations of 'y', 'u', 'v', 'j' (IIUC an
obsolete way to denote full range) , and 'a' (alpha channels), but
nothing else.

*Q2: Did I get this right?*

Finally, I repeated all the experiments (steps 1 to 5) with h264. As
for the parser, I used [h264nal](https://github.com/chemag/h264nal).
Everything was the same except using "libx264" instead of "libx265".
The file sizes are now:

| File                  | Size    |
| --------------------- | ------- |
| out.yuv420p.264       | 2023618 |
| out.yuv444p.264       | 2208675 |
| out.yuv444p.rgb.264   | 2208703 |
| out.yuv444p.ycocg.264 | 2208703 |
| out.yuv444p.ictcp.264 | 2208703 |
| out.rgb24.rgb.264     | 2208703 |
| out.rgb24.264         | 6002892 |

Interesting differences:
* the cost of going from 4:2:0 to 4:4:4 is 1.09x, which seems more
logical than  the 1.003x we saw in h265.
* the file produced using "`-pix_fmt rgb24 -colorspace rgb`" is the
same as the one produced "`-pix_fmt yuv444p -colorspace rgb`". Of
course, the colors are again broken. The fact that the behavior is
different for x264 vs. x265 is kind of worrying.

I found that you can use "`libx264rgb`" as the codec type.

$ ffmpeg -y -i input.264 -c:v libx264rgb -pix_fmt rgb24 -qp 30 out.rgb24.264
Step 6.

Note that the file size for the libx264rgb is 2.7x the size of the
yuv444p experiments, which is closer to my original 3x expectation
than in the x265 case.

Comparing the produced file with out.yuv444p.rgb.264, I can see the
following differences:
* the libx264rgb file is marked as full range (SPS.VUI
`video_full_range_flag` field is 1)
* the libx264rgb file uses different chroma QP index offsets
* all the slice payloads are different

Assuming I understood the function of pix_fmt and colorspace from the
h265 study, my hunch here is that the "`-pix_fmt rgb24 -colorspace
rgb`" combination is broken in h264, and that the libx264rgb mode was
created to fix the case. This would explain why there is no

*Q3: Did I get this right?*


PS: In case it matters, I'm running my own (modern) ffmpeg build:

More information about the ffmpeg-user mailing list