[FFmpeg-devel] [FFmpeg-cvslog] Adds ESPCN super resolution filter merged with SRCNN filter.

Pedro Arthur bygrandao at gmail.com
Mon Jul 2 23:28:17 EEST 2018


2018-07-02 17:05 GMT-03:00 Jean-Baptiste Kempf <jb at videolan.org>:
> Hello,
>
> On Mon, 2 Jul 2018, at 21:54, Pedro Arthur wrote:
>> 2018-07-02 16:19 GMT-03:00 Jean-Baptiste Kempf <jb at videolan.org>:
>> > On Mon, 2 Jul 2018, at 20:52, Pedro Arthur wrote:
>> >> >> > Where do they come from, how can we recreate them?
>> >> >> Paper link [1], and web page with reference matlab code [2].
>> >> >
>> >> > This code is not open source, and is not compatible with LGPLv2.1:
>> >> >
>> >> > "If you use/adapt our code in your work (either as a stand-alone tool or as a component of any algorithm),
>> >> > you need to appropriately cite our ECCV 2014 paper or arXiv paper."
>> >> >
>> >> > Reimplementation of the code in a different language does not remove IP.
>> >> The paper is properly cited in libavfilter/vf_sr.c.
>> >>
>> >> I not specialist in  IP rights but  there is nothing novel in the code
>> >> (TBH there is no code at all, look at srcnn.m) it is just plain
>> >> convolution, basic math machinery, we are just applying a bunch of
>> >> filter convolution to an image.
>> >
>> > Well, either it is based on a paper and the matlab code, or it is not.
>> > If it is, it's a big issue, since this is not open source compatible.
>> >
>> > If it is just convolution, then why is it based on this paper?
>> Our code does basic convolutional neural network operations, the same
>> as lots of other ML libraries, this is simple math no IP.
>
> Yet, there are lots of table of unexplained data arrays. Where are those numbers coming from?
>
>> The paper proposes a model (basically a filter layout) were it uses 3
>> convolutional layers with filters(input x width x height x output)
>> 1x9x9x64, 64x1x1x32, 32x5x5x1.
>> Therefore the only thing you can consider novel is the filter
>> dimension. Training also does not constitute IP as it just applying
>> any optimization algorithm like Gradient Descent to the model.
>>
>> Considering filter size as IP seems (to me) as absurd as considering
>> an image size as IP.
>
> Then what are the numbers?
>
>> > What is the source of those numbers?
>> They are the filters (or weights), result of training the model with a
>> particular set of image data.
>
> Where is the data set?
>
> Sorry, you cannot dump 10kB of array and not explain where they come from.
We have the raw weights, the floats array, and we have a serialized
TensorFlow model.
The raw weights are use by our own "native" implementation, the
serialized TF model is used in case the user enables the compilation
with TF library.
The TF model was constructed and serialized by our student Sergey.
What may be done, if it is your concern, is to build the TF model on
the fly using the raw weights.
Other that that I don't have any other solution for the large size of
the filters, except disabling the sr filter in your build.

>
> 9479955 was bad already with 3 convolution kernels dumped with almost no explanation.
> And then basically reverted most of those kernels in bdf1bbd.
> And then adding other unexplained numbers in d8c0bbb0.
As stated previously, the data are the weights, when we achieve better
image quality through training we update the weights.
On the math side, the optimization problem can have multiple  minimum
points, thus two completely different kernels may achieve the same or
close quality (minimum point).
As the training may be stochastic it is almost sure each training
result will generate different kernels.

>
> But this one is beyond that...
>
> (Not to mention the invalid license)
>
> --
> Jean-Baptiste Kempf -  President
> +33 672 704 734


More information about the ffmpeg-devel mailing list