[FFmpeg-devel] [RFC] swscale dithering
Niklas Haas
ffmpeg at haasn.xyz
Mon Mar 24 14:43:19 EET 2025
Hi all,
As part of my ongoing swscale rewrite, we have both the opportunity and the
need to make a central decision about how to apply rounding and/or dithering.
Some particular cases I want to point out and gather feedback on include:
1. Should we dither and/or accurately round when scaling up full range
content? For example, say you are converting from full-range rgb24 to
rgb30. The correct conversion is (rgb / 255 * 1023), which involves a
rational factor of exactly 341 / 85, or roughly 4.01176. The fact that this
factor is irrational means that an exact conversion without dithering,
while not strictly speaking *lossy*, necessarily introduces rounding error.
An input value of 200, for example, gives 200 * 1023 / 255 = 802.35294...,
which ought to be accurately dithered down to a 35%/65% mix of 802 and 803.
This is not what current swscale (nor many other pieces of software) do,
instead they simply calculate the much easier (x << 2) | (x >> 6). This
amounts to chopping off the lowest 6 bits. i.e. truncating down. With a
light bit of extra effort we can at least round correctly by adding on the
(x & 5) bit to the result.
This is especially problematic for the alpha channel, as a correct
upconversion of yuva444p to yuva444p10 would otherwise collapse to a simple
left shift by 2 if not for the presence of the alpha channel which would
require a full float conversion, multiplication and dither pass.
2. At what bit depth does dithering become negligible? For context, the
generally quoted threshold of human visual perception is ~12 bits SDR and
~14 bits HDR. So for something like yuv444p16, we could get away with
outputting the truncated results without dithering nor accurate rounding,
without the risk of human visible error. However, this does increase the
risk of a *compounding* error as more and more conversions are performed.
3. Should we dither per-channel after conversion from grayscale to RGB? For
example, say I am converting gray10 to rgb24. The most performance way to
do this would be to dither the gray channel down to gray8 and then copy it
to all three values (R, G, B) = (Y8). The more accurate way to do it, OTOH,
would be to set (R10, G10, B10) = (Y10) and then dither each channel
independently, with an offset dither mask per channel. This gives greater
precision, which may matter especially when dithering to a very low bit
depth (e.g. rgb8 or rgb4), but makes the conversion roughly 3x more
expensive.
3. What should we make of the SWS_ACCURATE_RND and SWS_BITEXACT flags? I am
personally thinking that SWS_BITEXACT should become a no-op flag, with
bit exact output being the default behavior of all new implementations.
But What about SWS_ACCURATE_RND?
I am thinking that SWS_ACCURATE_RND should essentially be the switch that
toggles our preferred resolution of question 1. So in other words, with
SWS_ACCURATE_RND specified, full range upconversions should go through an
accurate dither pass, while being relaxed to the simple (x << 2) | (x >> 6)
upconversion in the absence of this flag.
How should this flag relate to question 2? With the flag specified, I am
thinking that we should also force dithering even at 16 bit depth, and
skip dithering in this case only in the flag's absence. If so, what
bit depth should the cutoff threshold be, for when to skip accurate
dithering? I am thinking to simply use the 12/14 bit SDR/HDR threshold as
appropriate for the content type.
This would lead to the following conversions, as an illustration:
SWS_ACCURATE_RND specified:
- rgb24 -> yuv420p10: full dithering
- rgb24 -> yuv420p12: full dithering
- rgb24 -> rgb30: full dithering
- rgb24 -> rgba64: full dithering
- yuva444p -> yuva444p10: scale YUV, dither alpha
- yuva444p14 -> yuva444p16: scale YUV, dither alpha
- yuv444p10 -> yuv444p14: left shift, no dithering needed
SWS_ACCURATE_RND absent:
- rgb24 -> yuv420p10: full dithering
- rgb24 -> yuv420p12: truncate if SDR, full dithering if HDR
- rgb24 -> rgb30: truncate
- rgb24 -> rgba64: truncate
- yuva444p -> yuva444p10: left shift YUV, truncate alpha
- yuva444p14 -> yuva444p16: left shift YUV, truncate alpha
Does this seem reasonable?
More information about the ffmpeg-devel
mailing list