[FFmpeg-devel] [PATCH V5] Add a filter implementing HDR image generation from a single exposure using deep CNNs

Guo, Yejun yejun.guo at intel.com
Fri Nov 16 08:26:08 EET 2018



> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> Of Li, Zhong
> Sent: Thursday, November 15, 2018 8:22 PM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH V5] Add a filter implementing HDR
> image generation from a single exposure using deep CNNs
> 
> > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> Behalf
> > Of Liu Steven
> > Sent: Thursday, November 15, 2018 5:40 PM
> > To: FFmpeg development discussions and patches
> > <ffmpeg-devel at ffmpeg.org>
> > Cc: Liu Steven <lq at chinaffmpeg.org>
> > Subject: Re: [FFmpeg-devel] [PATCH V5] Add a filter implementing HDR
> > image generation from a single exposure using deep CNNs
> >
> >
> >
> > > 在 2018年11月14日,下午8:15,Guo, Yejun <yejun.guo at intel.com>
>> > 道:
> > >
> > > see the algorithm's paper and code below.
> > >
> > > the filter's parameter looks like:
> > >
> >
> sdr2hdr=model_filename=/path_to_tensorflow_graph.pb:out_fmt=gbrp10l
> e
> > >
> > > The input of the deep CNN model is RGB24 while the output is float
> > > for each color channel. This is the filter's default behavior to
> > > output format with gbrpf32le. And gbrp10le is also supported as the
> > > output, so we can see the rendering result in a player, as a reference.
> > >
> > > To generate the model file, we need modify the original script a little.
> > > - set name='y' for y_final within script at
> > > https://github.com/gabrieleilertsen/hdrcnn/blob/master/network.py
> > > - add the following code to the script at
> > >
> https://github.com/gabrieleilertsen/hdrcnn/blob/master/hdrcnn_predict.
> > > py
> > >
> > > graph = tf.graph_util.convert_variables_to_constants(sess,
> > > sess.graph_def, ["y"]) tf.train.write_graph(graph, '.', 'graph.pb',
> > > as_text=False)
> > >
> > > And I also uploaded the model file under
> > https://drive.google.com/drive/folders/1URsRY5g-VdE-kHlP5vQoLoimMIZ-
> S
> > X00?usp=sharing.
> > >
> > > The filter only works when tensorflow C api is supported in the
> > > system, native backend is not supported since there are some
> > > different types of layers in the deep CNN model, besides CONV and
> > DEPTH_TO_SPACE.
> > >
> > > https://arxiv.org/pdf/1710.07480.pdf:
> > >  author       = "Eilertsen, Gabriel and Kronander, Joel, and Denes,
> > Gyorgy and Mantiuk, Rafał and Unger, Jonas",
> > >  title        = "HDR image reconstruction from a single exposure using
> > deep CNNs",
> > >  journal      = "ACM Transactions on Graphics (TOG)",
> > >  number       = "6",
> > >  volume       = "36",
> > >  articleno    = "178",
> > >  year         = "2017"
> > >
> > > https://github.com/gabrieleilertsen/hdrcnn
> > >
> > > btw, as a whole solution, metadata should also be generated from the
> > > sdr video, so to be encoded as a HDR video. Not supported yet.
> > > This patch just focuses on this paper.
> > >
> > > This filter accepts 8bit frame (RGB24) and outputs 10bit/float
> > > frame, and there's no reference image, so it is not feasible to use
> > > criteria such as
> > PNSR, SSIM.
> > >
> > > I choose the same method described in the paper to demo the filter
> > > effect, that means the frames before/after the filter are reduced by
> > > 3
> > stops.
> > >
> > > The native video (test.native.mp4) is created from 7 png files at
> > > https://github.com/gabrieleilertsen/hdrcnn/tree/master/data (the
> > > size of the image is enlarged to 1920*1080 with extra area filled
> > > with white)
> > with command line:
> > > ffmpeg -f image2 -i ./img_%03d.png -c:v libx264 -preset veryslow
> > > -crf 1
> > test.native.mp4.
> > >
> > > And two rgb24 videos are generated before/after the filter with -3
> > > stops by modifying the code a little, see in the video folder at the
> > > google drive (the same place as where the model file locates).
> > >
> > > For your convenient, I also dump png files from generated videos and
> > > combine the before/after pngs into one file, see in png folder at
> > > the
> > google drive.
> 
> I see three limitations from the code but haven't been noted in the texi or
> commit message:
> 1. Only support one resolution: 1920x1080. Other resolution can't been
> supported.
> 2. RGB24 is the only input format can be supported.
> 3. No meta data which may break encoder.

thanks, will add into texi.

> 
> (Should be good if can remove any of them).
> 
> > >
> > > Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> > > ---
> > > configure                |   1 +
> > > doc/filters.texi         |  36 +++++++
> > > libavfilter/Makefile     |   1 +
> > > libavfilter/allfilters.c |   1 +
> > > libavfilter/vf_sdr2hdr.c | 268
> > > +++++++++++++++++++++++++++++++++++++++++++++++
> > > 5 files changed, 307 insertions(+)
> > > create mode 100644 libavfilter/vf_sdr2hdr.c
> > >
> > > diff --git a/configure b/configure
> > > index b02b4cc..19138e8 100755
> > > --- a/configure
> > > +++ b/configure
> > > @@ -3446,6 +3446,7 @@ sab_filter_deps="gpl swscale"
> > > scale2ref_filter_deps="swscale"
> > > scale_filter_deps="swscale"
> > > scale_qsv_filter_deps="libmfx"
> > > +sdr2hdr_filter_deps="libtensorflow"
> > > select_filter_select="scene_sad"
> > > sharpness_vaapi_filter_deps="vaapi"
> > > showcqt_filter_deps="avcodec avformat swscale"
> > > diff --git a/doc/filters.texi b/doc/filters.texi index
> > > 0d9ff43..2e6a6af 100644
> > > --- a/doc/filters.texi
> > > +++ b/doc/filters.texi
> > > @@ -14868,6 +14868,42 @@ Scale a subtitle stream (b) to match the
> > > main video (a) in size before overlayin @end example @end itemize
> > >
> > > + at section sdr2hdr
> > > +
> > > +HDR image generation from a single exposure using deep CNNs with
> > TensorFlow C library.
> > > +
> > > + at itemize
> > > + at item
> > > +paper:  see @url{https://arxiv.org/pdf/1710.07480.pdf}
> > > +
> > > + at item
> > > +code with model and trained parameters: see
> > > + at url{https://github.com/gabrieleilertsen/hdrcnn}
> > > + at end itemize
> > > +
> > > +The filter accepts the following options:
> > > +
> > > + at table @option
> > > +
> > > + at item model_filename
> > > +Set path to model file specifying network architecture and its
> > > +parameters, can download from
> > >
> > + at url{https://drive.google.com/drive/folders/1URsRY5g-VdE-kHlP5vQoLoi
> > m
> > > +MIZ-SX00?usp=sharing}
> > > +
> > > + at item out_fmt
> > > +the data format of the filter's output.
> > > +
> > > +It accepts the following values:
> > > + at table @samp
> > > + at item gbrpf32le
> > > +force gbrpf32le output
> > > +
> > > + at item gbrp10le
> > > +force gbrp10le output
> > > + at end table
> > > +
> > > +Default value is @samp{gbrpf32le}.
> > > +
> > > + at end table
> > > +
> > > @anchor{selectivecolor}
> > > @section selectivecolor
> > >
> > > diff --git a/libavfilter/Makefile b/libavfilter/Makefile index
> > > 7c6fc83..936a525 100644
> > > --- a/libavfilter/Makefile
> > > +++ b/libavfilter/Makefile
> > > @@ -365,6 +365,7 @@ OBJS-$(CONFIG_SOBEL_OPENCL_FILTER)
> > += vf_convolution_opencl.o opencl.o
> > > OBJS-$(CONFIG_SPLIT_FILTER)                  += split.o
> > > OBJS-$(CONFIG_SPP_FILTER)                    += vf_spp.o
> > > OBJS-$(CONFIG_SR_FILTER)                     += vf_sr.o
> > > +OBJS-$(CONFIG_SDR2HDR_FILTER)                += vf_sdr2hdr.o
> > > OBJS-$(CONFIG_SSIM_FILTER)                   += vf_ssim.o
> > framesync.o
> > > OBJS-$(CONFIG_STEREO3D_FILTER)               += vf_stereo3d.o
> > > OBJS-$(CONFIG_STREAMSELECT_FILTER)           += f_streamselect.o
> > framesync.o
> > > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> > > index
> > > 484b080..622f9f3 100644
> > > --- a/libavfilter/allfilters.c
> > > +++ b/libavfilter/allfilters.c
> > > @@ -322,6 +322,7 @@ extern AVFilter ff_vf_scale_npp; extern AVFilter
> > > ff_vf_scale_qsv; extern AVFilter ff_vf_scale_vaapi; extern AVFilter
> > > ff_vf_scale2ref;
> > > +extern AVFilter ff_vf_sdr2hdr;
> > > extern AVFilter ff_vf_select;
> > > extern AVFilter ff_vf_selectivecolor; extern AVFilter ff_vf_sendcmd;
> > > diff --git a/libavfilter/vf_sdr2hdr.c b/libavfilter/vf_sdr2hdr.c new
> > > file mode 100644 index 0000000..85a58ea
> > > --- /dev/null
> > > +++ b/libavfilter/vf_sdr2hdr.c
> > > @@ -0,0 +1,268 @@
> > > +/*
> > > + * Copyright (c) 2018 Guo Yejun
> > > + *
> > > + * This file is part of FFmpeg.
> > > + *
> > > + * FFmpeg is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU Lesser General Public
> > > + * License as published by the Free Software Foundation; either
> > > + * version 2.1 of the License, or (at your option) any later version.
> > > + *
> > > + * FFmpeg is distributed in the hope that it will be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > GNU
> > > + * Lesser General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU Lesser General Public
> > > + * License along with FFmpeg; if not, write to the Free Software
> > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> > > +02110-1301 USA  */
> > > +
> > > +/**
> > > + * @file
> > > + * Filter implementing HDR image generation from a single exposure
> > > +using
> > deep CNNs.
> > > + * https://arxiv.org/pdf/1710.07480.pdf
> > > + */
> > > +
> > > +#include "avfilter.h"
> > > +#include "formats.h"
> > > +#include "internal.h"
> > > +#include "libavutil/opt.h"
> > > +#include "libavutil/qsort.h"
> > > +#include "libavformat/avio.h"
> > > +#include "libswscale/swscale.h"
> > > +#include "dnn_interface.h"
> > > +#include <math.h>
> > > +
> > > +typedef struct SDR2HDRContext {
> > > +    const AVClass *class;
> > > +
> > > +    char* model_filename;
> > > +    enum AVPixelFormat out_fmt;
> > > +    DNNModule* dnn_module;
> > > +    DNNModel* model;
> > > +    DNNData input, output;
> > > +} SDR2HDRContext;
> > > +
> > > +#define OFFSET(x) offsetof(SDR2HDRContext, x) #define FLAGS
> > > +AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM
> static
> > const
> > > +AVOption sdr2hdr_options[] = {
> > > +    { "model_filename", "path to model file specifying network
> > architecture and its parameters", OFFSET(model_filename),
> > AV_OPT_TYPE_STRING, {.str=NULL}, 0, 0, FLAGS },
> > > +    { "out_fmt", "the data format of the filter's output, it could
> > > + be
> > gbrpf32le [default] or gbrp10le", OFFSET(out_fmt),
> > AV_OPT_TYPE_PIXEL_FMT, {.i64=AV_PIX_FMT_GBRPF32LE},
> AV_PIX_FMT_NONE,
> > AV_PIX_FMT_NB - 1, FLAGS },
> > > +    { NULL }
> > > +};
> > > +
> > > +AVFILTER_DEFINE_CLASS(sdr2hdr);
> > > +
> > > +static av_cold int init(AVFilterContext* context) {
> > > +    SDR2HDRContext* ctx = context->priv;
> > > +
> > > +    if (ctx->out_fmt != AV_PIX_FMT_GBRPF32LE && ctx->out_fmt !=
> > AV_PIX_FMT_GBRP10LE) {
> > > +        av_log(context, AV_LOG_ERROR, "could not support the output
> > format\n");
> > > +        return AVERROR(ENOSYS);
> > > +    }
> > > +
> > > +    ctx->dnn_module = ff_get_dnn_module(DNN_TF);
> > > +    if (!ctx->dnn_module){
> > > +        av_log(context, AV_LOG_ERROR, "could not create DNN
> > module for tensorflow backend\n");
> > > +        return AVERROR(ENOMEM);
> > > +    }
> > > +    if (!ctx->model_filename){
> > > +        av_log(context, AV_LOG_ERROR, "model file for network was
> > not specified\n");
> > > +        return AVERROR(EIO);
> > > +    }
> > > +    if (!ctx->dnn_module->load_model) {
> > > +        av_log(context, AV_LOG_ERROR, "load_model for network was
> > not specified\n");
> > > +        return AVERROR(EIO);
> > > +    }
> > > +    ctx->model =
> > (ctx->dnn_module->load_model)(ctx->model_filename);
> > > +    if (!ctx->model){
> > > +        av_log(context, AV_LOG_ERROR, "could not load DNN
> > model\n");
> > > +        return AVERROR(EIO);
> > > +    }
> > > +    return 0;
> > > +}
> > > +
> > > +static int query_formats(AVFilterContext* context) {
> > > +    const enum AVPixelFormat in_formats[] = {AV_PIX_FMT_RGB24,
> > > +
> > AV_PIX_FMT_NONE};
> > > +    enum AVPixelFormat out_formats[2];
> > > +    SDR2HDRContext* ctx = context->priv;
> > > +    AVFilterFormats* formats_list;
> > > +    int ret = 0;
> > > +
> > > +    formats_list = ff_make_format_list(in_formats);
> > > +    if ((ret = ff_formats_ref(formats_list,
> > &context->inputs[0]->out_formats)) < 0)
> > > +        return ret;
> > > +
> > > +    out_formats[0] = ctx->out_fmt;
> > > +    out_formats[1] = AV_PIX_FMT_NONE;
> > > +    formats_list = ff_make_format_list(out_formats);
> > > +    if ((ret = ff_formats_ref(formats_list,
> > &context->outputs[0]->in_formats)) < 0)
> > > +        return ret;
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static int config_props(AVFilterLink* inlink) {
> > > +    AVFilterContext* context = inlink->dst;
> > > +    SDR2HDRContext* ctx = context->priv;
> > > +    AVFilterLink* outlink = context->outputs[0];
> > > +    DNNReturnType result;
> > > +
> > > +    // the dnn model is tied with resolution due to deconv layer of
> > tensorflow
> > > +    // now just support 1920*1080 and so the magic numbers within
> > > + this
> > file
> > > +    if (inlink->w != 1920 || inlink->h != 1080) {
> > > +        av_log(context, AV_LOG_ERROR, "only support frame size with
> > 1920*1080\n");
> > > +        return AVERROR(ENOSYS);
> > > +     }
> > > +
> > > +    ctx->input.width = 1920;
> > > +    ctx->input.height = 1088;  //the model requires height is a
> > > + multiple
> > of 32,
> 
> Would be better to avoid any hard code. I prefer something like:
> 
> ctx->input.width = inlink->w;
> ctx->input.height = FFALIGN(inlink->h, 32);
> 

will fix.

> 
> > > +    ctx->input.channels = 3;
> > > +
> > > +    result = (ctx->model->set_input_output)(ctx->model->model,
> > &ctx->input, &ctx->output);
> > > +    if (result != DNN_SUCCESS){
> > > +        av_log(context, AV_LOG_ERROR, "could not set input and
> > output for the model\n");
> > > +        return AVERROR(EIO);
> > > +    }
> > > +
> > > +    memset(ctx->input.data, 0, ctx->input.channels *
> > > + ctx->input.width *
> > ctx->input.height * sizeof(float));
> > > +    outlink->h = 1080;
> > > +    outlink->w = 1920;
> 
> And also here:
> outlink->h = inlink->h;
> outlink->w = inlink->w;
> 
> Then one more resolution supported, there is few code need to be changed.
> 

will fix, I left the magic numbers just for a clear note.

> > > +    return 0;
> > > +}
> > > +
> > > +static float qsort_comparison_function_float(const void *a, const
> > > +void *b) {
> > > +    return *(const float *)a - *(const float *)b; }
> > > +
> > > +static int filter_frame(AVFilterLink* inlink, AVFrame* in) {
> > > +    DNNReturnType dnn_result = DNN_SUCCESS;
> > > +    AVFilterContext* context = inlink->dst;
> > > +    SDR2HDRContext* ctx = context->priv;
> > > +    AVFilterLink* outlink = context->outputs[0];
> > > +    AVFrame* out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> > > +    int total_pixels = in->height * in->width;
> > > +
> > > +    if (!out){
> > > +        av_log(context, AV_LOG_ERROR, "could not allocate memory
> > for output frame\n");
> > > +        av_frame_free(&in);
> > > +        return AVERROR(ENOMEM);
> > > +    }
> > > +
> > > +    av_frame_copy_props(out, in);
> > > +
> > > +    for (int i = 0; i < in->linesize[0] * in->height; ++i) {
> > > +        ctx->input.data[i] = in->data[0][i] / 255.0f;
> > > +    }
> > > +
> > > +    dnn_result = (ctx->dnn_module->execute_model)(ctx->model);
> > > +    if (dnn_result != DNN_SUCCESS){
> > > +        av_log(context, AV_LOG_ERROR, "failed to execute loaded
> > model\n");
> > > +        return AVERROR(EIO);
> > > +    }
> > > +
> > > +    if (ctx->out_fmt == AV_PIX_FMT_GBRPF32LE) {
> > > +        float* outg = (float*)out->data[0];
> > > +        float* outb = (float*)out->data[1];
> > > +        float* outr = (float*)out->data[2];
> > > +        for (int i = 0; i < total_pixels; ++i) {
> > > +            float r = ctx->output.data[i*3];
> > > +            float g = ctx->output.data[i*3+1];
> > > +            float b = ctx->output.data[i*3+2];
> > > +            outr[i] = r;
> > > +            outg[i] = g;
> > > +            outb[i] = b;
> > > +        }
> > > +    } else
> 
> Would be better to change to "else if (fmt=gbrp10le)", and give an assert in
> the below?
> (I believe the format should be checked again though it has been checked in
> the initialization stage)

ok, will fix.

> 
> {
> > > +        // here, we just use a rough mapping to the 10bit contents
> > > +        // meta data generation for HDR video encoding is not
> > supported yet
> > > +        float* converted_data = (float*)av_malloc(total_pixels * 3
> > > + *
> > sizeof(float));
> > > +        int16_t* outg = (int16_t*)out->data[0];
> > > +        int16_t* outb = (int16_t*)out->data[1];
> > > +        int16_t* outr = (int16_t*)out->data[2];
> > > +
> > > +        float max = 1.0f;
> > > +        for (int i = 0; i < total_pixels * 3; ++i) {
> > > +            float d = ctx->output.data[i];
> > > +            d = sqrt(d);
> > > +            converted_data[i] = d;
> > > +            max = FFMAX(d, max);
> > > +        }
> > > +
> > > +        if (max > 1.0f) {
> > > +            AV_QSORT(converted_data, total_pixels * 3, float,
> > qsort_comparison_function_float);
> > > +            // 0.5% pixels are clipped
> > > +            max = converted_data[(int)(total_pixels * 3 * 0.995)];
> > > +            max = FFMAX(max, 1.0f);
> > > +
> > > +            for (int i = 0; i < total_pixels * 3; ++i) {
> > > +                float d = ctx->output.data[i];
> > > +                d = sqrt(d);
> > > +                d = FFMIN(d, max);
> > > +                converted_data[i] = d;
> > > +            }
> > > +        }
> > > +
> > > +        for (int i = 0; i < total_pixels; ++i) {
> > > +            float r = converted_data[i*3];
> > > +            float g = converted_data[i*3+1];
> > > +            float b = converted_data[i*3+2];
> > > +            outr[i] = r / max * 1023;
> > > +            outg[i] = g / max * 1023;
> > > +            outb[i] = b / max * 1023;
> > > +        }
> > > +
> > > +        av_free(converted_data);
> > > +    }
> > > +
> > > +    av_frame_free(&in);
> > > +    return ff_filter_frame(outlink, out); }
> > > +
> > > +static av_cold void uninit(AVFilterContext* context) {
> > > +    SDR2HDRContext* ctx = context->priv;
> > > +
> > > +    if (ctx->dnn_module){
> > > +        (ctx->dnn_module->free_model)(&ctx->model);
> > > +        av_freep(&ctx->dnn_module);
> > > +    }
> > > +}
> > > +
> > > +static const AVFilterPad sdr2hdr_inputs[] = {
> > > +    {
> > > +        .name         = "default",
> > > +        .type         = AVMEDIA_TYPE_VIDEO,
> > > +        .config_props = config_props,
> > > +        .filter_frame = filter_frame,
> > > +    },
> > > +    { NULL }
> > > +};
> > > +
> > > +static const AVFilterPad sdr2hdr_outputs[] = {
> > > +    {
> > > +        .name = "default",
> > > +        .type = AVMEDIA_TYPE_VIDEO,
> > > +    },
> > > +    { NULL }
> > > +};
> > > +
> > > +AVFilter ff_vf_sdr2hdr = {
> > > +    .name          = "sdr2hdr",
> > > +    .description   = NULL_IF_CONFIG_SMALL("HDR image generation
> > from a single exposure using deep CNNs."),
> > > +    .priv_size     = sizeof(SDR2HDRContext),
> > > +    .init          = init,
> > > +    .uninit        = uninit,
> > > +    .query_formats = query_formats,
> > > +    .inputs        = sdr2hdr_inputs,
> > > +    .outputs       = sdr2hdr_outputs,
> > > +    .priv_class    = &sdr2hdr_class,
> > > +    .flags         = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC,
> > > +};
> > > --
> > > 2.7.4
> > >
> > > _______________________________________________
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel at ffmpeg.org
> > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > I have tested this patch, have some question:
> >
> > 1. It must use Tensorflow, why don’t support native mode?

the model file includes nearly 20 ops while only two ops is supported in native mode.

> > 2. It is must input 1920x1080 resolution, can this add more resolution
> > support?
> 
> Yup, would be better if we can support more resolution.
> 

With current tensorflow, the model file is tied with one resolution. 
A tricky is to generate more model files for each resolution, but it is not friendly to user.
The nice solution is to fix tensorflow first and so a single model file can support all the solutions.

> > 3. i looked into the project of hdrcnn, that License is BSD-3-Clause,
> > Is this will have License problem?
> 
> Should be not a problem since it hasn't involved any source code of this
> project.

I think so, thanks.

> >
> > Thanks
> >
> > Steven
> >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


More information about the ffmpeg-devel mailing list