[FFmpeg-devel] [PATCH 2/2] libavfilter: Removes stored DNN models. Adds support for native backend model file format in tf backend.

Sergey Lavrushkin dualfal at gmail.com
Mon Aug 20 21:22:52 EEST 2018

2018-08-17 18:18 GMT+03:00 Sergey Lavrushkin <dualfal at gmail.com>:

> 2018-08-17 17:46 GMT+03:00 Pedro Arthur <bygrandao at gmail.com>:
>> Hi,
>> You did not provided any pre trained model files, so anyone trying to
>> test it has to perform the whole training!
>> I'm attaching the models I generated, if anyone is interested in testing
>> it.
>> When applying the filter with tf backend there are artifacts in the
>> borders, for both srcnn and espcn (out_[srcnn|espcn]_tf.jpg).
>> It seems that a few lines in the top row of the image are repeated for
>> espcn using native backend (out_srcnn_nt.jpg).
> I guess, it is because I didn't add any padding to the image and tf fills
> borders with 0 for 'SAME' padding in convolutions. I'll add required
> padding
> size calculation and insert padding operation to the graph.
>> The model/model_filename options are not coherent, the model type
>> should be defined in the file anyway therefore there is no need for
>> both options.
>> It is also buggy, if you specify the model_filename but not the model
>> type it will default to srcnn even if the model file is for espcn, no
>> error is generated and the output ofc is buggy.
> I think, I can remove model type and check if model changes input size.
> I think all my switches for model type actually depend on this condition.
> If I remove conversions inside the filter and make it to work only for
> one plane, it basically will become a filter that executes neural network
> for one channel input. But there is a problem with float format - it brokes
> fate on some 32 bit hosts, as James stated, and I need first to fix this
> issue, or, otherwise, revert to doing conversions in the filter.
>> I personally would prefer to use only model=file as it is shorter than
>> model_filename=file.

I updated this patch. I added padding to the tf model constructed from
backend model file and updated sr filter to work only with float gray
format, removing
all conversions and scaling from it.

But there are some issues. First, to use this filter for formats with
chroma channels I do:
ffmpeg -i in.bmp -filter_complex 'extractplanes=y+u+v[y][u][v]' -map '[y]'
y.bmp -map '[u]' u.bmp -map '[v]' v.bmp
ffmpeg -i y.bmp -vf sr=dnn_backend=tensorflow:model=espcn.model y2.bmp
ffmpeg -i u.bmp -vf scale=iw*2:-1 u2.bmp
ffmpeg -i v.bmp -vf scale=iw*2:-1 v2.bmp
ffmpeg -i y2.bmp -i u2.bmp -i v2.bmp -filter_complex
'mergeplanes=0x001020:yuv444p' out.bmp
Can these commands be merged into one command? I haven't add any examples
to the filters.texi yet,
because this example maybe a bad one with all these intermediate outputs.
Another issue, is that this filter basically became a filter that executes
neural network for
one channel input and this network does not have to be a network for super
resolution (although native
backend supports few layers, basically supporting sr models). Maybe this
filter should be renamed to
something like dnn? It can be extended later to support inputs with more
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-libavfilter-Removes-stored-DNN-models.-Adds-support-.patch
Type: text/x-patch
Size: 563171 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20180820/b7c7003f/attachment.bin>

More information about the ffmpeg-devel mailing list