[FFmpeg-devel] [RFC] swscale dithering

Mon Mar 24 14:43:19 EET 2025

Hi all,

As part of my ongoing swscale rewrite, we have both the opportunity and the
need to make a central decision about how to apply rounding and/or dithering.

Some particular cases I want to point out and gather feedback on include:

1. Should we dither and/or accurately round when scaling up full range
   content? For example, say you are converting from full-range rgb24 to
   rgb30. The correct conversion is (rgb / 255 * 1023), which involves a
   rational factor of exactly 341 / 85, or roughly 4.01176. The fact that this
   factor is irrational means that an exact conversion without dithering,
   while not strictly speaking *lossy*, necessarily introduces rounding error.

   An input value of 200, for example, gives 200 * 1023 / 255 = 802.35294...,
   which ought to be accurately dithered down to a 35%/65% mix of 802 and 803.

   This is not what current swscale (nor many other pieces of software) do,
   instead they simply calculate the much easier (x << 2) | (x >> 6). This
   amounts to chopping off the lowest 6 bits. i.e. truncating down. With a
   light bit of extra effort we can at least round correctly by adding on the
   (x & 5) bit to the result.

   This is especially problematic for the alpha channel, as a correct
   upconversion of yuva444p to yuva444p10 would otherwise collapse to a simple
   left shift by 2 if not for the presence of the alpha channel which would
   require a full float conversion, multiplication and dither pass.

2. At what bit depth does dithering become negligible? For context, the
   generally quoted threshold of human visual perception is ~12 bits SDR and
   ~14 bits HDR. So for something like yuv444p16, we could get away with
   outputting the truncated results without dithering nor accurate rounding,
   without the risk of human visible error. However, this does increase the
   risk of a *compounding* error as more and more conversions are performed.

3. Should we dither per-channel after conversion from grayscale to RGB? For
   example, say I am converting gray10 to rgb24. The most performance way to
   do this would be to dither the gray channel down to gray8 and then copy it
   to all three values (R, G, B) = (Y8). The more accurate way to do it, OTOH,
   would be to set (R10, G10, B10) = (Y10) and then dither each channel
   independently, with an offset dither mask per channel. This gives greater
   precision, which may matter especially when dithering to a very low bit
   depth (e.g. rgb8 or rgb4), but makes the conversion roughly 3x more
   expensive.

3. What should we make of the SWS_ACCURATE_RND and SWS_BITEXACT flags? I am
   personally thinking that SWS_BITEXACT should become a no-op flag, with
   bit exact output being the default behavior of all new implementations.
   But What about SWS_ACCURATE_RND?

   I am thinking that SWS_ACCURATE_RND should essentially be the switch that
   toggles our preferred resolution of question 1. So in other words, with
   SWS_ACCURATE_RND specified, full range upconversions should go through an
   accurate dither pass, while being relaxed to the simple (x << 2) | (x >> 6)
   upconversion in the absence of this flag.

   How should this flag relate to question 2? With the flag specified, I am
   thinking that we should also force dithering even at 16 bit depth, and
   skip dithering in this case only in the flag's absence. If so, what
   bit depth should the cutoff threshold be, for when to skip accurate
   dithering? I am thinking to simply use the 12/14 bit SDR/HDR threshold as
   appropriate for the content type.

This would lead to the following conversions, as an illustration:

SWS_ACCURATE_RND specified:

- rgb24 -> yuv420p10:       full dithering
- rgb24 -> yuv420p12:       full dithering
- rgb24 -> rgb30:           full dithering
- rgb24 -> rgba64:          full dithering
- yuva444p -> yuva444p10:   scale YUV, dither alpha
- yuva444p14 -> yuva444p16: scale YUV, dither alpha
- yuv444p10 -> yuv444p14:   left shift, no dithering needed

SWS_ACCURATE_RND absent:

- rgb24 -> yuv420p10:       full dithering
- rgb24 -> yuv420p12:       truncate if SDR, full dithering if HDR
- rgb24 -> rgb30:           truncate
- rgb24 -> rgba64:          truncate
- yuva444p -> yuva444p10:   left shift YUV, truncate alpha
- yuva444p14 -> yuva444p16: left shift YUV, truncate alpha

Does this seem reasonable?