[FFmpeg-devel] [PATCH] libavfilter: temporarily remove DNN framework and vf_sr filter

Fri Jul 27 14:12:45 EEST 2018

Hi,

On Fri, Jul 27, 2018 at 7:02 AM, Ronald S. Bultje <rsbultje at gmail.com>
wrote:

> Hi,
>
> On Thu, Jul 26, 2018 at 11:49 PM, Pedro Arthur <bygrandao at gmail.com>
> wrote:
>
>> Taking the vp9 as example, sure the coeficients are obtained by the
>> 'poly3' but the real data are the polynomial coeficients, does any one
>> asks where these polynomial coeficients came from, reproducibility,
>> etc? Your comparison does not seems fair to me.
>
>
> The ~1.02 in VP9/AV1 and 6 in HEVC? They are stepsizes (in different
> units). It's exactly clear where they come from and it's obvious how to
> adjust them. You want more than 52 qps at the same range in HEVC? Increase
> the 6.
>
> How do I adjust the NN coefficients to get a defined adjustment in
> behaviour?
>

Actually, since I know you'll answer like you don't get it, let me explain
it in more detail to you: the adjustment in quantizer, derivation of the
coefficients if you will, is all defined by the process itself. The
external tools used to validate are the actual encoding software itself
(the model). You call this training? Sure, but then the software created by
the training *is* the training software in itself. But in all
reasonableness, it's clear what each value means and how its adjustments
affects the process - that is, if you are willing to understand.

NN is different. The exposed software *uses* but does not *generate* the
coefficients. In fact, the meaning of most coefficients is completely
opaque. They are just multiplier and addition constants with no obvious
relationship to reality, other than that they were generated by some
magical process that tells us that statistically (??), this bears
resemblence to some trained form of reality. You want to adjust behaviour
for a defined different outcome? You'll have to retrain. That's all fine,
don't get me wrong, but we need to know how to retrain since it's not
obvious to us. Just tell us what process and what data were used to get the
outcome so that we can get a different outcome if we so choose (that *is*
freedom to modify)?

Please document (and make available) all such information for the
re-training.

There's a second, orthogonal, concern: many of these filters that we're
seeing now are filters to convert X to Y are do process Z on data A. That's
great. I love it. I have no problem with it. I see a filter that converts
8bit RGB to 10bit HDR RGB and it's fantastic, it's a little bit like
colorspace (which I wrote) but maybe it can actually get closer to intended
(rather than artifacted) source, which would be fantastic. I hope it works.
But: how do we know that it does? For example, for the colorspace filter,
it's clear, because all coefficients come from the reference documentation.
For the NN filter version, how do I know it works? Seriously. If I were an
anonymous Russian submitting such a filter to the American elections, how
can I guarantee that it does what I say it does rather than just randomly
garbling the lower bits? How would I know the difference? Again, I need to
know how it was trained.

Lastly, if all filters do the same but with a different set of coefs, then
why not simply have one filter and loadable coefs? Maybe subclassing etc.
for better documentation - it would make SIMD a lot easier since we only
have to write it once.

Ronald