[FFmpeg-devel] [RFC]] swscale modernization proposal

Michael Niedermayer michael at niedermayer.cc
Sat Jun 29 15:35:32 EEST 2024


On Sat, Jun 29, 2024 at 01:47:43PM +0200, Niklas Haas wrote:
> On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas <ffmpeg at haasn.xyz> wrote:
> > Hey,
> > 
> > As some of you know, I got contracted (by STF 2024) to work on improving
> > swscale, over the course of the next couple of months. I want to share my
> > current plans and gather feedback + measure sentiment.
> > 
> > ## Problem statement
> > 
> > The two issues I'd like to focus on for now are:
> > 
> > 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp,
> >    IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...)
> > 2. Complicated context management, with cascaded contexts, threading, stateful
> >    configuration, multi-step init procedures, etc; and related bugs
> > 
> > In order to make these feasible, some amount of internal re-organization of
> > duties inside swscale is prudent.
> > 
> > ## Proposed approach
> > 
> > The first step is to create a new API, which will (tentatively) live in
> > <libswscale/avscale.h>. This API will initially start off as a near-copy of the
> > current swscale public API, but with the major difference that I want it to be
> > state-free and only access metadata in terms of AVFrame properties. So there
> > will be no independent configuration of the input chroma location etc. like
> > there is currently, and no need to re-configure or re-init the context when
> > feeding it frames with different properties. The goal is for users to be able
> > to just feed it AVFrame pairs and have it internally cache expensive
> > pre-processing steps as needed. Finally, avscale_* should ultimately also
> > support hardware frames directly, in which case it will dispatch to some
> > equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will
> > defer this to a future milestone)
> 
> So, I've spent the past days implementing this API and hooking it up to
> swscale internally. (For testing, I am also replacing `vf_scale` by the
> equivalent AVScale-based implementation to see how the new API impacts
> existing users). It mostly works so far, with some left-over translation
> issues that I have to address before it can be sent upstream.
> 
> ------
> 
> One of the things I was thinking about was how to configure
> scalers/dither modes, which sws currently, somewhat clunkily, controls
> with flags. IMO, flags are not the right design here - if anything, it
> should be a separate enum/int, and controllable separately for chroma
> resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80).
> 
> That said, I think that for most end users, having such fine-grained
> options is not really providing any end value - unless you're already
> knee-deep in signal theory, the actual differences between, say,
> "natural bicubic spline" and "Lanczos" are obtuse at best and alien at
> worst.
> 
> My idea was to provide a single `int quality`, which the user can set to
> tune the speed <-> quality trade-off on an arbitrary numeric scale from
> 0 to 10, with 0 being the fastest (alias everything, nearest neighbour,
> drop half chroma samples, etc.), the default being something in the
> vicinity of 3-5, and 10 being the maximum quality (full linear
> downscaling, anti-aliasing, error diffusion, etc.).

I think 10 levels is not fine grained enough,
when there are more then 10 features to switch on/off we would have
to switch more than 1 at a time.

also the scale has an issue, that becomes obvious when you consider the
extreems, like memset(0) at level 0, not converting chroma at level 1
and hiring a human artist to paint a matching upscaled image at 10
using a neural net at 9

the quality factor would probably have thus at least 3 ranges
1. the as fast as possible with noticeable quality issues
2. the normal range
3. the as best as possible, disregarding the computation needed

some encoder (like x264) use words like UltraFast and Placebo for the ends
of this curve

It also would be possible to use a more formal definition of how much one
wants to trade quality per time spend but that then makes it harder to
decide which feature to actually turn on when one requests a ratio between
PSNR and seconds


> 
> The upside of this approach is that it would be vastly simpler for most
> end users. It would also track newly added functionality automatically;
> e.g. if we get a higher-quality tone mapping mode, it can be
> retroactively added to the higher quality presets. The biggest downside
> I can think of is that doing this would arguably violate the semantics
> of a "bitexact" flag, since it would break results relative to
> a previous version of libswscale - unless we maybe also force a specific
> quality level in bitexact mode?
> 
> Open questions:
> 
> 1. Is this a good idea, or do the downsides outweigh the benefits?



> 2. Is an "advanced configuration" API still needed, in addition to the
>    quality presets?

For regression testing and debuging it is very usefull to be able to turn
features on one at a time. A failure could then be quickly isolated to
a feature.



[...]

> /**
>  * Statically test if a conversion is supported. Values of (respectively)
>  * NONE/UNSPECIFIED are ignored.
>  *
>  * Returns 1 if the conversion is supported, or 0 otherwise.
>  */
> int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src);
> int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src);
> int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimaries src);
> int avscale_test_transfer(enum AVColorTransferCharacteristic dst,
>                           enum AVColorTransferCharacteristic src);

If we support A for any input and and support B for any output then we
should support converting from A to B

I dont think this API is a good idea. It allows supporting random subsets
which would cause confusion and wierd bugs by code using it.
(for example removial of an intermediate filter could lead to failure)

[...]

thx

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Elect your leaders based on what they did after the last election, not
based on what they say before an election.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20240629/06d56b50/attachment.sig>


More information about the ffmpeg-devel mailing list